fix(onboard): confirm dashboard forward by live local port when openshell can't track it#6116
fix(onboard): confirm dashboard forward by live local port when openshell can't track it#6116tedy-y wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughAdds untracked-forward detection and a local port-liveness fallback for detached forward start. When the diagnostic pattern matches and the port is listening, the helper now returns ChangesUntracked forward detection and port-live fallback
Estimated code review effort: 2 (Simple) | ~15 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
src/lib/onboard/forward-start.ts (1)
96-113: 🔒 Security & Privacy | 🔵 Trivial | 💤 Low valueConsider passing the port as an argv value instead of interpolating it into the script text.
Static analysis flags the
require("node:child_process")call for command-injection risk (CWE-78). The actual risk here is negligible sinceportis typednumberand further coerced viaNumber(port)before interpolation, so this is effectively a false positive. Still, for defense-in-depth and to avoid manually building a JS string, pass the port as a separate process argument rather than interpolating it into the script text.🔒 Proposed defense-in-depth tweak
const script = 'const net=require("net");' + - `const s=net.connect(${Number(port)},"127.0.0.1");` + + 'const s=net.connect(Number(process.argv[2]),"127.0.0.1");' + "let settled=false;" + "const finish=(code)=>{if(settled)return;settled=true;try{s.destroy();}catch{}process.exit(code);};" + `s.setTimeout(${probeTimeoutMs});` + 's.once("connect",()=>finish(0));' + 's.once("timeout",()=>finish(1));' + 's.once("error",()=>finish(1));'; const { spawnSync } = require("node:child_process"); - const res = spawnSync(process.execPath, ["-e", script], { + const res = spawnSync(process.execPath, ["-e", script, String(Number(port) || 0)], { stdio: "ignore", timeout: probeTimeoutMs + 2_000, });Separately, please confirm whether the rest of this file uses ES
importstatements — if so, consider hoistingchild_process'sspawnSyncto a top-level import for consistency rather than a function-localrequire.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard/forward-start.ts` around lines 96 - 113, The probeLocalPortListening helper currently embeds the port directly into the -e script string, which static analysis flags as injection-prone. Update probeLocalPortListening to accept the port as a separate argv/process argument and read it inside the spawned script instead of interpolating it into the script text, keeping the net.connect call behavior unchanged. Also, if this file already uses ES module imports elsewhere, replace the function-local require("node:child_process") with a top-level spawnSync import for consistency.Source: Linters/SAST tools
src/lib/onboard/forward-start.test.ts (1)
302-375: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winConsider a test for conflict-vs-untracked-forward precedence.
The PR explicitly calls out that the EADDRINUSE conflict check runs before the untracked-forward fallback "to avoid false positives," but no test here exercises a diagnostic that matches both
looksLikeForwardPortConflictandlooksLikeUntrackedForwardsimultaneously to provespawn-conflictwins overok-port-live. This is the exact safety invariant the fix depends on.🧪 Suggested additional test
+ it("prefers the port-conflict outcome over the untracked-forward fallback", () => { + const fetchList = vi.fn().mockReturnValue(forwardListWith([])); + const spawn = vi.fn().mockImplementation(({ stderr }: { stderr: number }) => { + fs.writeSync( + stderr, + "EADDRINUSE: address already in use; forward may be running but is not tracked\n", + ); + return { pid: 999 }; + }); + const sleep = vi.fn(); + const isPortListening = vi.fn().mockReturnValue(true); + + const result = runDetachedForwardStartWithDiagnostics( + spawn, + fetchList, + { port: 18789, sandboxName: "my-sandbox" }, + { overallTimeoutMs: 10_000, pollIntervalMs: 10, sleepMs: sleep, isPortListening }, + ); + + expect(result.ok).toBe(false); + expect(result.reason).toBe("spawn-conflict"); + expect(isPortListening).not.toHaveBeenCalled(); + });Note: verify the actual conflict-diagnostic text expected by
looksLikeForwardPortConflictbefore using it verbatim.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard/forward-start.test.ts` around lines 302 - 375, Add a regression test around runDetachedForwardStartWithDiagnostics to cover diagnostics that match both looksLikeForwardPortConflict and looksLikeUntrackedForward at the same time. Mock spawn so stderr contains a message that would satisfy both checks, then assert the result is spawn-conflict and that isPortListening is not consulted, proving the EADDRINUSE conflict path takes precedence over the live-port fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/lib/onboard/forward-start.test.ts`:
- Around line 302-375: Add a regression test around
runDetachedForwardStartWithDiagnostics to cover diagnostics that match both
looksLikeForwardPortConflict and looksLikeUntrackedForward at the same time.
Mock spawn so stderr contains a message that would satisfy both checks, then
assert the result is spawn-conflict and that isPortListening is not consulted,
proving the EADDRINUSE conflict path takes precedence over the live-port
fallback.
In `@src/lib/onboard/forward-start.ts`:
- Around line 96-113: The probeLocalPortListening helper currently embeds the
port directly into the -e script string, which static analysis flags as
injection-prone. Update probeLocalPortListening to accept the port as a separate
argv/process argument and read it inside the spawned script instead of
interpolating it into the script text, keeping the net.connect call behavior
unchanged. Also, if this file already uses ES module imports elsewhere, replace
the function-local require("node:child_process") with a top-level spawnSync
import for consistency.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c5ce884b-0bd4-49c8-ab75-95556de2a8ab
📒 Files selected for processing (2)
src/lib/onboard/forward-start.test.tssrc/lib/onboard/forward-start.ts
…hell can't track it On macOS hosts backed by Colima, `openshell forward start --background` establishes the SSH tunnel but its one-shot discovery probe runs before the remote bind completes, so the forward is never registered in `openshell forward list`. onboard polls that empty list, times out after 180s, and rolls back an otherwise-healthy sandbox even though the dashboard port is already listening and serving HTTP 200. When openshell reports "Could not discover backgrounded SSH process; forward may be running but is not tracked", confirm the forward by the real observable — the local forward port is actually accepting connections — via a portable, dependency-free Node loopback TCP probe. The EADDRINUSE conflict check runs first, so a port held by a different process is never mistaken for our forward, and the fallback is gated on openshell's own untracked-forward notice so a plain empty list never falsely confirms. Also benefits the agent-fixed (:8642) forward, which uses the same helper. Adds `looksLikeUntrackedForward`, an injectable `isPortListening` option (default: synchronous Node TCP probe), and unit tests. Fixes NVIDIA#6099 Signed-off-by: Tedy Yu <tedyy@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/lib/onboard/forward-start.ts (1)
315-325: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winRepeated subprocess spawns while untracked-forward diagnostic persists.
Once
looksLikeUntrackedForward(diagSoFar)is true but the port isn't yet listening, this branch re-invokesprobeLocalPortListening(spawning a fresh Node child) on every poll iteration (everypollIntervalMs, default 500ms) until either the port comes up or the 180s timeout elapses — potentially hundreds of subprocess spawns per onboard attempt. Each spawn also has its ownspawnSynctimeout window (up to 3.5s) that can stack on top ofpollIntervalMs, silently stretching the effective poll cadence well past whatpollIntervalMsimplies.Consider backing off the probe frequency (e.g., only probe every few poll cycles, or with its own interval) once the diagnostic has already matched, to reduce spawn overhead during the wait window.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard/forward-start.ts` around lines 315 - 325, The polling path in forward-start’s untracked-forward fallback is spawning a new port-probe process on every loop once looksLikeUntrackedForward(diagSoFar) matches, which can create excessive overhead while waiting for the port to become live. Update the forward confirmation flow in forward-start to back off probeLocalPortListening/probe frequency after the diagnostic first matches, such as by throttling checks to every few poll cycles or using a separate interval/cache, while keeping the existing isPortListening confirmation and timeout behavior intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/lib/onboard/forward-start.ts`:
- Around line 315-325: The polling path in forward-start’s untracked-forward
fallback is spawning a new port-probe process on every loop once
looksLikeUntrackedForward(diagSoFar) matches, which can create excessive
overhead while waiting for the port to become live. Update the forward
confirmation flow in forward-start to back off probeLocalPortListening/probe
frequency after the diagnostic first matches, such as by throttling checks to
every few poll cycles or using a separate interval/cache, while keeping the
existing isPortListening confirmation and timeout behavior intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: aec21292-de9e-453b-a260-1b4741316876
📒 Files selected for processing (2)
src/lib/onboard/forward-start.test.tssrc/lib/onboard/forward-start.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/onboard/forward-start.test.ts
|
@cv Hi, is this fix looking good? |
Summary
Fixes the macOS/Colima onboarding failure where
nemoclaw onboardrolls back an otherwise-healthy sandbox at the dashboard port-forward step.openshell forward start --backgroundestablishes the SSH ControlMaster tunnel, then does a one-shot discovery probe to register it. On Colima-backed hosts the remote bind completes after that probe, so openshell printsCould not discover backgrounded SSH process; forward may be running but is not trackedand the forward runs untracked — it never appears inopenshell forward listeven though the local port is already accepting connections and the dashboard is serving HTTP 200. onboard polls that empty list, times out after 180s, and deletes the sandbox.Fix
When openshell reports the untracked-forward notice and the local forward port is actually live, treat the forward as confirmed (
reason: "ok-port-live") instead of timing out.127.0.0.1:<port>(nolsof/ss/netstat). Works on macOS/Linux/WSL2/Windows.looksLikeUntrackedForward), so a plain empty list never falsely confirms.isPortListeningoption) so unit tests don't open real sockets.:8642) forward, which uses the same helper.Call sites only read
.ok/.diagnostic, so the added"ok-port-live"reason is non-breaking.Tests
src/lib/onboard/forward-start.test.ts:ok/ok-port-livetimeoutlooksLikeUntrackedForwardmatcher unit testsReproduced and verified on NemoClaw v0.0.68 → v0.0.69 → v0.0.70 (the bug persists across all three); the equivalent runtime patch completed real Hermes onboards on v0.0.69 and v0.0.70.
Fixes #6099
Signed-off-by: Tedy Yu tedyy@nvidia.com