Reliability hardening for use-local-agent (no API breaks)#2
Closed
Reliability hardening for use-local-agent (no API breaks)#2
Conversation
aidenybai
added a commit
that referenced
this pull request
Apr 26, 2026
…ateAgent (#3) * feat(use-local-agent): bump @agentclientprotocol/sdk to ^0.20.0 - Picks up reliability fixes from 0.17→0.20: - PR #103/#111: clean transport-failure handling, propagate ndjson errors - PR #119: parse final NDJSON message without trailing newline - PR #122: no spurious unhandledRejection on transport failures - PR #127: TS private keyword for cross-copy compatibility - PR #130: response/notification ordering fix - Renames stabilized methods (PR #132): - unstable_resumeSession → resumeSession - unstable_closeSession → closeSession - Updates LocalAgent, mock-agent harness, echo-agent fixture - Adds resume-close.test.ts regression tests Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com> * feat(use-local-agent): transport reliability hardening (Phase 2) - connect.ts: - Wait for spawn or error event before constructing connection (avoids races with later spawn errors leaking unhandled rejections). - Add disposeGraceMs option (default unchanged at 2000ms). - Add onTrace hook for inspecting in/out JSON-RPC messages and stderr. - Add envFilter hook for env scrubbing. - local-agent.ts: - Race connection.initialize() against subprocess close so a process that exits before responding fails fast as AgentConnectionClosedError (instead of waiting initializeTimeoutMs). - Gate stderr-pattern fatal detection (auth/usage) until after init succeeds, eliminating false positives during boot/login banners. - Allow callers to override stderrFatalPatterns. - utils/run-command.ts: Escalate to SIGKILL after a grace when SIGTERM is ignored on timeout. - New tests: - tests/spawn-failures.test.ts: ENOENT, fast subprocess exit - tests/stderr-fatal-detection.test.ts: pre/post-init stderr gating Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com> * feat(use-local-agent): turn-lifecycle correctness (Phase 3) - Inactivity watchdog: pause while a permission request is pending in stream mode (prevents killing sessions waiting on the user). Permission events also bump lastActivityAt. - Per-session buffered update cap (MAX_BUFFERED_UPDATES_PER_SESSION = 1024) drops oldest first to prevent unbounded memory growth. - Add loadSessionStreaming() returning { sessionId, replay, completion } so callers can iterate replay updates emitted during session/load BEFORE the first prompt. Existing loadSession() preserved for back-compat. - Slash commands: - Track latest available_commands_update per session. - Add LocalAgent.commandsFor(sessionId) / modeStateFor() / configOptionsFor() getters. - PromptInput now accepts { command: { name, input? } } and validates against advertised commands when known. - Error message preservation: stringifyCause now includes Error.cause message in wrapped errors when distinct. - Late tool_call_update notifications received after session/cancel are forwarded through the iterator (already worked in practice; explicit test coverage added). - Mode tracking: current_mode_update mutates the cached modeState. - New tests: - tests/late-updates.test.ts - tests/watchdog-permission.test.ts - tests/buffer-cap.test.ts - tests/load-session-replay.test.ts - tests/slash-commands.test.ts Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com> * feat(use-local-agent): spec coverage expansion (Phase 4) - Optional terminal/* support: when caller provides options.terminal handlers, advertise terminal capability and forward all five client terminal methods (create/output/release/wait_for_exit/kill). - additionalDirectories on createSession / loadSession / loadSessionStreaming / resumeSession with capability gating (sessionCapabilities.additionalDirectories) and absolute-path validation. - listSessions(input?: { cwd?, cursor? }) with pagination plumbing. Adds streamAllSessions helper that auto-paginates. - _meta trace context propagation via opt-in traceContext: () => Record. Only the W3C-reserved keys traceparent / tracestate / baggage are forwarded; other keys filtered out. - onAuthRequired hook: when newSession returns auth_required (-32000) and a hook is configured, prompt for an auth method, call authenticate, then retry once. Returning undefined from the hook preserves the original AgentUnauthenticatedError. - Convenience getters: commandsFor / modeStateFor / configOptionsFor. - Mode tracking: current_mode_update mutates cached modeState. - Tests: - tests/terminal.test.ts - tests/additional-directories.test.ts - tests/list-sessions-pagination.test.ts - tests/trace-context.test.ts - tests/auth-retry.test.ts Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com> * feat(use-local-agent): adapter robustness (Phase 5) - claude: ANTHROPIC_API_KEY env fallback bypasses 'claude auth status' parsing when the env is set; checkAuthenticated honors it. - copilot: GITHUB_TOKEN env fallback bypasses 'gh auth token' lookup; helpful error message updated. - gemini: GEMINI_API_KEY / GOOGLE_API_KEY env fallback before reading ~/.gemini/google_accounts.json; updated error message. (envFilter hook from Phase 2 already exposed for env scrubbing.) - New: tests/adapter-env-fallbacks.test.ts covering gemini env auth. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com> * docs(use-local-agent): README and changeset for ACP reliability overhaul (Phase 6) - Bump apps/playground SDK pin to ^0.20.0 for consistency - Document new options (terminal, additionalDirectories, listSessions pagination, traceContext, onAuthRequired, onTrace, disposeGraceMs, envFilter) in README - Document slash command helper and reliability features - Add .changeset entry summarizing the multi-phase upgrade - New tests: - tests/sdk-upgrade.test.ts (covers ordering through wrapper) End state: 70 unit tests green; 8/8 e2e Playwright tests against the real echo-agent subprocess green. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com> * feat(use-local-agent): add reliability helpers, error type, and constants Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com> * test(use-local-agent): mock harness helpers + wire fuzz (Phase 6 polish) - Add testing/mock-agent.ts helpers: - withSpawnFailure(id?): builds an AgentAdapter that always points at a missing binary (used by spawn-failures.test.ts) - withSdkOutOfOrderRace(chunks): builds a prompt handler that emits multiple notifications back-to-back, exercising the SDK's response/ notification ordering guarantee. - tests/spawn-failures.test.ts now uses withSpawnFailure for clarity. - tests/wire-fuzz.test.ts: bounded property-based test that feeds 200 randomized session updates per turn (seeded RNG) plus an interleaved cancellation variant to verify the wrapper does not crash and produces well-shaped AgentEvent values terminating in 'finish'. Total: 72 unit tests across 24 files. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com> * docs(playground): add README covering terminal opt-in and additionalDirectories Closes the documentation gap from the plan's file change matrix: - Documents the echo-agent fixture commands used by the e2e tests. - Shows fileSystem and terminal handler wiring patterns for callers. - Documents additionalDirectories session-input field and absolute-path requirement. Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com> * fix(use-local-agent): SIGKILL escalation, init-vs-exit race, fatal-buffer, validation timer leak, pending-update cap, abort short-circuit, perm settle gate, EPIPE handlers, stdout-noise filter, run-command caps Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com> * test(use-local-agent): add reliability test suite (14 tests) Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com> * chore: format + add reliability hardening changeset Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com> * docs(use-local-agent): align README with consolidated reliability features * feat(use-local-agent): become a Vercel AI SDK provider; drop streamAgent/generateAgent Implements the LanguageModelV3 spec (`@ai-sdk/provider` v3.0.x) so `use-local-agent` plugs directly into `streamText` / `generateText` from the `ai` package, eliminating the parallel one-shot helper API: import { streamText } from "ai"; import { localAgent } from "use-local-agent"; const { textStream } = streamText({ model: localAgent("claude"), prompt: "Refactor src/auth.ts", }); New module `src/ai-sdk/`: - `convert-prompt.ts` maps a `LanguageModelV3Prompt` (system/user/ assistant/tool messages) to ACP `ContentBlock[]`. System messages flatten into `systemPrompt`; image/audio file parts become ACP image/ audio blocks; unsupported file mediaTypes, assistant tool-call replays, and tool-result messages emit `unsupported` warnings. - `local-agent-language-model.ts` implements `LanguageModelV3`: - `doGenerate` returns ordered `reasoning` + `text` + `tool-call` content with mapped `LanguageModelV3FinishReason` and usage. - `doStream` emits `stream-start` (with warnings), `response-metadata` (carrying ACP `sessionId`), grouped `text-start/-delta/-end` and `reasoning-start/-delta/-end`, `tool-call`, `tool-result`, `finish`, and `error` parts. Opt-in raw passthrough via `includeRawChunks`. - Takes a `connect` factory so it's testable through the existing `connectMockAgent` harness without spawning a subprocess. - `provider.ts` exports `createLocalAgentProvider()` (a callable `ProviderV3` with a `.languageModel()` method, plus `.fromAdapter()` for custom ACP-speaking subprocesses) and a default `localAgent` instance. `embeddingModel` / `imageModel` throw `NoSuchModelError`. Default `permission: "auto-allow"` so `streamText` Just Works. Mappings: - ACP `text-delta` → V3 `text-delta` (start/end framed by message id) - ACP `thinking-delta` → V3 `reasoning-delta` - ACP `tool-call` → V3 `tool-call` with `providerExecuted: true` - ACP `tool-call-update` (completed/failed) → V3 `tool-result` (with `isError` on failure) - ACP stop reason → `{ unified, raw }` - Unsupported V3 call options (temperature, topP/topK, seed, stopSequences, responseFormat:json, tools, etc.) → `stream-start` warnings. Removed: - `src/stream.ts` (`streamAgent`, `generateAgent`, `OneShotAgentStream`, `StreamAgentOptions`, `GenerateAgentResult`) — fully redundant with `streamText` / `generateText` from `ai`. Wiring: - New deps: `@ai-sdk/provider@^3.0.8`, `@ai-sdk/provider-utils@^4.0.23`. - Subpath export `use-local-agent/ai-sdk` and a top-level re-export. - `vite.config.ts` adds `src/ai-sdk/index.ts` as a build entry. - README leads with `streamText({ model: localAgent('claude') })`; the `LocalAgent` API moves to an "Advanced" section for stateful multi-turn, permission gating, slash commands, and session resume. Tests: - New `tests/ai-sdk-provider.test.ts` covers prompt conversion, provider shape, `NoSuchModelError`, `doGenerate` (reasoning+text+tool-call ordering, finish-reason mapping), `doStream` part ordering, and unsupported-option warnings. Quality gates: `pnpm typecheck` clean, `pnpm lint` 0/0 on 72 files, and all 96 tests across 26 files pass. Also drops the leftover `.changeset/acp-reliability-overhaul.md` per prior cleanup. * docs: polish README - add MCP, multi-modal, configuration table; tighten reliability section * feat(use-local-agent): bundle ai; re-export streamText/generateText/streamObject/generateObject Adds `ai@^6.0.168` as a regular dependency and re-exports its top-level helpers from `use-local-agent`'s main entry, so callers get the end-to-end story from a single install: import { streamText, localAgent } from "use-local-agent"; streamText({ model: localAgent("claude", { cwd: "/path/to/project" }), prompt: "...", }); Bringing your own `ai` still works — importing from `"ai"` directly is unchanged and resolves to the same versions transitively. README updated to: - show the new single-import pattern as the headline usage, - demonstrate inline per-call settings on the model factory (`localAgent("codex", { cwd, permission, inactivityTimeoutMs, onTrace })`), - call out that users can still import from `"ai"` if they prefer. * revert(use-local-agent): require `ai` as a peer dep instead of bundling Moves `ai` from `dependencies` to `peerDependencies` (required, not optional) and drops the top-level re-exports of `streamText`, `generateText`, `streamObject`, `generateObject`. Callers install `ai` themselves and import it directly: npm install use-local-agent ai import { streamText } from "ai"; import { localAgent } from "use-local-agent"; Same behavior as `@ai-sdk/openai`, `@ai-sdk/anthropic`, etc. — keeps the provider package small and avoids dragging in `ai` for callers that might not use it transitively. README updated to use separate imports throughout. * docs: remove em-dashes from READMEs and test/playground docs * docs: cut README from 242 to 108 lines; remove implementation-detail sections * feat(use-local-agent): add createLocalAgentSession for stateful streamText Adds an AI-SDK-shaped path for multi-turn ACP sessions so callers don't have to drop down to the LocalAgent API for follow-up turns: import { streamText } from "ai"; import { createLocalAgentSession } from "use-local-agent"; await using session = await createLocalAgentSession("codex"); await streamText({ model: session.model, prompt: "list TODOs" }); await streamText({ model: session.model, prompt: "now fix the highest one" }); await streamText({ model: session.model, prompt: "agent client protocol", providerOptions: { useLocalAgent: { command: "web" } }, }); Implementation: - LocalAgentLanguageModel: replace the single-purpose `connect` factory with a more general `acquire` factory that returns { agent, sessionId, historyAlreadyApplied, release }. One-shot mode (default `localAgent("...")`) creates a fresh agent + session and closes both on release. Session-bound mode (createLocalAgentSession) reuses an existing agent + session and makes release a no-op. - convertPromptToContentBlocks: new `lastUserOnly` option used by session-bound models to skip prior conversation history (the ACP session already holds it). Emits a `compatibility` warning describing how many messages were skipped, plus an `unsupported` warning if the prompt has no trailing user message at all. - providerOptions.useLocalAgent.command: forwards a slash command to the underlying ACP session (object form `{ name, input? }` or bare string). - createLocalAgentSession opens one LocalAgent + one session, exposes `.model` (LanguageModelV3 bound to the session), `.agent` and `.sessionId` for ACP-specific surfaces, and implements `Symbol.asyncDispose` so `await using` cleans up the subprocess. 3 new tests cover: trailing-user-only behavior with a compatibility warning, slash command via `{ name, input }`, and the bare-string command shorthand. Existing 96 tests updated to the new acquire shape. Total: 99 tests across 26 files passing; pnpm typecheck and lint clean. README: replaces the LocalAgent-based stateful example with a session + streamText example; LocalAgent escape hatch still mentioned for permission prompts / terminal handlers / session resume. * fix(use-local-agent): address AGENTS.md review findings on AI SDK provider High severity: - LocalAgentLanguageModel: emit a `system-message-on-bound-session` warning when a session-bound model sees a per-turn system message. Previously the system prompt was silently dropped because the underlying ACP session was already opened. - createLocalAgentSession: drop the no-op `typeof === "string" ? x : x` ternary; the parameter is already correctly typed. Medium severity: - provider.ts: extract a shared `buildOneShotAcquire` so `buildLanguageModel` and `buildFromAdapter` no longer duplicate the same 14-line acquire factory body (DRY). - LocalAgentProvider: tighten `modelId` parameter from `SupportedAgentId | string` to just `SupportedAgentId`; arbitrary strings now go through `fromAdapter(...)` so typos like `localAgent("clauder")` fail at compile time instead of runtime. - Add tests covering: - system-message-on-bound-session warning - no-trailing-user-message warning + zero-block prompt to ACP - abort during doStream calls release exactly once - toJsonValue -> toJsonResult: returns NonNullable<JSONValue> (matches LanguageModelV3ToolResult.result) by JSON-roundtripping with a String(value) fallback. Drops the over-permissive NonNullable<unknown> contract. Low severity / style: - Hoist provider name / providerOptions key to `constants.ts` as `AI_SDK_PROVIDER_NAME` and `AI_SDK_PROVIDER_OPTIONS_KEY`. - Rename `historyAlreadyApplied` -> `isSessionBound` everywhere (config, AcquiredSession, tests). - LocalAgentSessionOptions now `extends LocalAgentConnectOptions` directly instead of going through the type alias. - Inline `.filter((p) => p.type === ...)` + `.map(...)` -> single `.map` / `.flatMap` to drop two `as { ... }` casts in convert-prompt. - Trim JSDoc that restated field / function names per AGENTS.md "default to NO comments". Comments on `release()` (idempotency semantic) and the public `localAgent` / `createLocalAgentSession` examples are kept. Verified: pnpm typecheck clean, pnpm lint 0/0 on 72 files, pnpm test 102 passing across 26 files. * feat(spawn-agent)!: rename package use-local-agent -> spawn-agent Renames the package and aligns the AI SDK provider surface with the Vercel AI SDK community convention (factory drops the `Provider` suffix: `createLocalAgentProvider` -> `createSpawnAgent`). Public API surface: - `localAgent` -> `spawnAgent` - `createLocalAgentProvider` -> `createSpawnAgent` - `createLocalAgentSession` -> `createSpawnAgentSession` - `LocalAgent` (runtime class) -> `SpawnAgent` - `LocalAgent{Provider,LanguageModel,Session,ConnectOptions,ClientInfo,Error,...}` -> `SpawnAgent*` - `providerOptions: { useLocalAgent: { command } }` -> `providerOptions: { spawnAgent: { command } }` - `model.provider` is now `"spawn-agent"` Constants in src/constants.ts updated: `PACKAGE_NAME`, `PACKAGE_TITLE`, `AI_SDK_PROVIDER_NAME`, `AI_SDK_PROVIDER_OPTIONS_KEY`. Files renamed via `git mv` to preserve history: - packages/use-local-agent/ -> packages/spawn-agent/ - src/local-agent.ts -> src/spawn-agent.ts - src/ai-sdk/local-agent-language-model.ts -> spawn-agent-language-model.ts - masterdocs/.../integration-notes-for-use-local-agent.md -> integration-notes-for-spawn-agent.md Includes review-pass cleanup: local `spawnAgent` variables renamed to `agent` to avoid shadowing the public default export, error re-exports re-sorted alphabetically, and a major-version changeset documenting the migration. * chore: drop spawn-agent rename changeset * feat(spawn-agent): bundle ACP shims as direct dependencies `@agentclientprotocol/claude-agent-acp` and `@zed-industries/codex-acp` now ship as regular dependencies of `spawn-agent` instead of optional peer dependencies. Users no longer need to install them separately to drive Claude Code or Codex. `ai` remains a peer dependency. Updates README install instructions accordingly. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Aiden Bai <aidenybai@users.noreply.github.com>
Member
Author
|
Superseded by #3. All reliability fixes from this PR (SIGKILL escalation, init-vs-exit race, EPIPE handlers, stdout-noise filter, run-command caps, pending-update cap, validation timer leak, perm settle gate, AgentStdinClosedError, the 14 reliability tests) were consolidated into the rename PR and are already on main at the new |
a20c0e8 to
acb5de7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Deep audit of
use-local-agentsurfaced 10 confirmed reliability bugs and 3 hardening gaps. This PR fixes every one of them, end-to-end-validated against real subprocesses, with no public API changes beyond a single new exported error type (AgentStdinClosedError) and one new optional config knob (pendingUpdateBufferLimit).All 42 existing unit tests still pass. 14 new unit tests + a 5-scenario real-subprocess harness lock the fixes in.
Bugs fixed
P0 — confirmed hangs / leaks / lost signals
SIGKILL escalation never fires in
connect.dispose()andrunCommand.child.killedflips totrueas soon as the signal is sent, regardless of whether the process actually died. So the innersetTimeout-then-SIGKILL block was always short-circuited and a stubborn agent could hangagent.close()forever. Replaced with a realchild.exitCode === null && child.signalCode === nullcheck via a newisProcessAlivehelper.initializehung the full timeout when the subprocess crashed early. A child that died 5ms after spawn (auth crash, ENOENT, segfault) made us wait the entire 30s default timeout before throwing a misleadingAgentInitTimeoutError. Now races againstconnectionResult.closedand rejects withAgentConnectionClosedError— including the captured stderr tail — within tens of ms.Fatal stderr signals during init were silently dropped. Auth-failure / usage-limit lines printed during
initializefiredonFatalErrorinto an empty listener set because theLocalAgentlistener was registered after construction. Construction would then proceed normally and the user would see a misleadingAgentInitErrorlater. Now the dispatcher buffers the first fatal error andfromConnectResultchecks the buffer immediately after init resolves.EPIPE on stdin could escalate to
uncaughtException. No error listeners onchild.stdin/stdout/stderr; certain Node versions emit'error'synchronously in the writable's_writebefore the callback fires. Added no-op error listeners on every stream and aAgentStdinClosedErrortyped-error path for writes against a dead subprocess.stderr line buffer was unbounded. A 10 MiB single-line stack trace could buffer in memory before flushing. Now capped at 64 KiB via
appendCapped.runCommandstdout/stderr buffers were unbounded. Capped at 1 MiB each (truncating from the front so the most recent output is preserved — mirrors ACP terminal semantics).Inactivity-watchdog timer leaked on prompt-validation failure. When the prompt contained image/audio/resource without the required capability,
state.activeStreamwas cleared butinactivityTimerwas not. Now cleared explicitly.Per-session pending-update buffer was unbounded. Notifications buffered before the first prompt grew without limit. Now capped at 1000 by default (configurable via the new
pendingUpdateBufferLimitoption). On overflow the oldest are dropped and a singlerawevent surfaces_meta.droppedNotificationson the next prompt drain.(Folded into ACP reliability + spec coverage overhaul (use-local-agent) #1 —
runCommandhad the same SIGKILL bug; same fix.)pendingConfigOptionsconsumed even when validation fails. First prompt that throws validation used to also consume the config-options state; the next successful prompt would never see them. Now: validate first, drain buffers/config only after validation passes.P1 — additional hardening
Defensive stdout-noise filter. ACP says agents must write only ACP messages to stdout. In practice many CLIs print banners (
✔ Connected to ...). Previously a single banner crashed the connection with an opaqueAgentInitErrorfrom the JSON parser. Lines that don't begin with{are now routed toonStderrprefixed with[stdout-noise]and dropped from the JSON-RPC stream; valid JSON lines pass through unchanged.AbortSignalalready-aborted atprompt()invocation is honored without sendingsession/prompt. Previously we'd dispatch the prompt and immediately try tosession/cancelit. Now we synthesize afinishevent withstopReason: "cancelled"and never call the agent.Permission requests can no longer double-resolve. A concurrent
respond()from astream-mode caller and a session-cleanup path used to race; cleanup now snapshots-and-clears the pending set before resolving.Files
No adapter file is touched. Public adapter API, event types, and option fields are unchanged except for the additions above.
Evidence
Unit tests — all green
42 existing + 14 new.
pnpm typecheck,pnpm lint,pnpm format:checkall clean.End-to-end against real subprocesses
A real-process harness drove the echo-agent fixture and 4 purpose-built fault-injection subprocesses. All 5 fixes confirmed in production-shaped paths:
Bug #1 alone — without this fix, an agent that swallows SIGTERM would have hung
agent.close()indefinitely on the host.Existing echo-agent end-to-end
The package's existing real-subprocess paths (init, prompts, tool calls, usage, auth-required, cancel, subprocess exit) all still pass:
Public API impact
AgentStdinClosedError(_tag: "StdinClosed").LocalAgentConnectOptions.pendingUpdateBufferLimit?: number(default1000).Risk
Low. The only behavioral surface visible to existing users is the stdout-noise filter, which is strictly additive (banners that previously crashed the connection now arrive as
[stdout-noise]stderr lines instead). All other changes are correctness fixes for paths that were already broken; if anyone was relying on those broken paths they have a worse problem to begin with.Changeset
A
patchchangeset is included at.changeset/reliability-hardening.md.