SDK Runtime Guide
The SDK runtime guide focuses on what happens after an application starts: how the client connects to Nacos, how it listens to configuration and subscribes to services, how it recovers from network jitter, and which local data can be used as temporary fallback.
This page does not replace the API manual for each language SDK. It describes common Client SDK runtime semantics. Java is the current reference implementation. Go, Python, and other language SDKs align with these semantics as their runtime capabilities evolve.
1. Client SDK Boundary
Business applications should prefer the Client SDK. It is designed for application runtime access and is suitable for:
- Reading known configurations and listening to their changes.
- Registering and deregistering the current application instance.
- Querying and subscribing to known downstream services.
- Querying, subscribing to, or registering runtime AI resources such as MCP endpoints, Agent endpoints, Prompts, Skills, and AgentSpecs.
- Recovering declared runtime intent, such as listeners, subscriptions, and ephemeral instance registrations, after reconnect.
The Client SDK is not a broad management interface. Do not use it to list all namespaces, all configurations, all services, or all clients. Use the Admin API, Console API, or Maintainer SDK for bulk configuration management, audit queries, operations queries, and server state changes.
2. Decide Runtime Identity During Initialization
An SDK instance usually binds to one namespace. If an application needs to access multiple namespaces, create multiple SDK instances and close them when they are no longer needed.
Confirm these fields during initialization:
| Information | Description |
|---|---|
serverAddr or endpoint | Use serverAddr for a fixed Nacos cluster address. Use endpoint when the client obtains the server list from an address service. |
namespace | The namespace bound to the current SDK instance. |
group, dataId, serviceName | Business identifiers used when accessing runtime resources. |
username, password, accessKey, secretKey, and other credentials | When auth is enabled, the SDK carries identity material according to the server-side auth mode. |
contextPath | The context path of Nacos HTTP APIs. The default is nacos. |
| gRPC port offset | Nacos 3.x clients still use the same main-port-plus-offset rule. By default, 9848 is derived from 8848. |
The deployment must make both the HTTP port and the gRPC port reachable. The HTTP port is used for OpenAPI, login, and some compatibility requests. The gRPC port is used for most long-lived client runtime requests.
3. Connection Is Not A One-Time Action
Since Nacos 2.x, the main Client SDK runtime channel is a gRPC long-lived connection. The client resolves the server list, selects a server, establishes a connection, and reconnects when health checks fail, the server sends a reset request, or the server list changes.
After reconnect, the SDK attaches declared runtime intent to the new connection again. For example:
- Configuration listeners are resynchronized.
- Service subscriptions are resubscribed.
- Ephemeral instances are registered again.
- AI endpoints and subscriptions are recovered when the capability supports recovery.
This is runtime recovery, not server-side data repair. If a write request times out, the client cannot decide from local state alone whether the server has processed it.
4. Configuration Runtime
Use configuration runtime capabilities with these principles:
- Applications should read known
dataIdandgroupvalues. Avoid broad configuration search during business runtime. - Use listeners for continuous change awareness instead of high-frequency polling.
- If the application needs to read the current value and then listen for later changes, use the corresponding query-and-listen method in the language SDK.
- Avoid long blocking logic inside listener callbacks. Hand events to a business thread pool when processing is complex.
- Configuration publish, delete, history, and bulk management are management scenarios. Prefer the Admin API or Maintainer SDK.
The Java SDK maintains local configuration snapshots. After a successful server query, the client writes the snapshot. When the server is unavailable, some read paths can fall back to the last successfully read local data.
Local failover files provide stronger local override. They are maintained by users and are used when Nacos servers are unavailable, remote changes are risky, or a release window needs to temporarily freeze configuration. Failover files are not written back to the server automatically.
5. Service Discovery Runtime
Service discovery runtime centers on registration, subscription, and local service views.
- Ephemeral instances are suitable for ordinary application instances. After a connection is disconnected, the server clears connection-scoped ephemeral state. The client redoes registration after reconnect.
- Persistent instances are suitable when the server should maintain health status through health checks. Their lifecycle is not only bound to the client connection.
- When subscribing to a service, the client receives server pushes and maintains a local service view.
- Local cache is only the last known service view, not the authoritative server state.
- Push-empty protection can ignore abnormal empty instance lists and avoid replacing a usable view with an empty one unexpectedly.
If the server is unavailable, the client may temporarily use local cache or a failover view. Business callers should still keep their own timeout, retry, and circuit-breaking strategies.
6. AI Resource Runtime
Some resources in AI Registry also enter runtime. Applications can query Prompts, Skills, and AgentSpecs, register MCP endpoints or Agent endpoints, and subscribe to resource changes.
As with configuration and service discovery, AI runtime needs a clear split between resource management and runtime use:
- Creating, publishing, bringing online, taking offline, importing, and governing resources are management actions.
- Querying, downloading, subscribing to resources, and registering endpoints at application runtime are Client SDK or Client API scenarios.
For more resource models, read AI Registry.
7. Local Cache, Failover, And Redo
The SDK runtime uses several kinds of local or in-memory data:
| Data | Purpose | Notes |
|---|---|---|
| Configuration snapshot | Stores the last successfully read configuration content. | Used only for read recovery. It does not represent the latest server state. |
| Configuration failover file | A user-maintained local configuration override. | It has high local priority but is not written back to the server. |
| Service local cache | Stores the last known service instance view. | Suitable for short-term fallback. It should not replace the server for a long time. |
| Service failover data | Overrides service discovery results when failover is enabled. | It only affects the local discovery view. |
| Redo data | Records runtime intent such as subscriptions and ephemeral registrations. | Used for recovery after reconnect. It does not prove that a write has been persisted. |
The common rule is simple: these data types are not authoritative server state. During troubleshooting, check client logs, server state, and management API results together.
8. Runtime Troubleshooting
When a client issue occurs, start with this checklist:
| Symptom | What To Check |
|---|---|
| Cannot connect after startup | Check serverAddr, endpoint, HTTP port, gRPC port, context path, and network policy. |
| Auth failure | Check the server-side auth plugin, client credentials, token refresh, and whether RAM/OIDC identity modes match the deployment. |
| Configuration listener does not fire | Check whether dataId, group, and namespace are consistent, and whether the client is reconnecting repeatedly. |
| Service subscription result is empty | Check service name, group, cluster, instance health status, and push-empty protection settings. |
| Instance disappears after network jitter | Check whether ephemeral instance redo has recovered and whether the client has reconnected. |
| Local cache data is unexpected | Check JM.SNAPSHOT.PATH, the failover switch, and whether cache files were changed manually. |
Common Java SDK local paths include:
JM.LOG.PATH: client log root directory. By default, it is under the user’slogsdirectory.JM.SNAPSHOT.PATH: configuration and service local cache root directory. By default, it is under the user’s home directory.
For more Java SDK parameters, read Java SDK Configuration Parameters. For server ports and deployment boundaries, read Deployment Manual.
9. Close SDK Instances
When an application exits or no longer uses an SDK instance, call the close method provided by the language SDK. Closing releases connections, background tasks, listeners, and subscription context.
Closing an SDK does not delete user-maintained failover files and does not write local cache back to the server.