Skip to content

feat(api-rs): pin sandbox pods via SESSION_SANDBOX_NODE_SELECTOR / _TOLERATIONS#3

Open
mo4islona wants to merge 2 commits into
mainfrom
feat/sandbox-node-scheduling
Open

feat(api-rs): pin sandbox pods via SESSION_SANDBOX_NODE_SELECTOR / _TOLERATIONS#3
mo4islona wants to merge 2 commits into
mainfrom
feat/sandbox-node-scheduling

Conversation

@mo4islona

Copy link
Copy Markdown

What

Two new optional env knobs on api-rs that set the nodeSelector and tolerations of every agent sandbox pod:

  • SESSION_SANDBOX_NODE_SELECTORkey=value[,key=value...]
  • SESSION_SANDBOX_TOLERATIONS — JSON array of toleration objects, e.g. [{"key":"centaur","operator":"Equal","value":"true","effect":"NoSchedule"}]

Why

api-rs inlines the sandbox pod template when it creates the agents.x-k8s.io Sandbox (it does not go through SandboxTemplate/SandboxClaim). So the sandbox pod had no scheduling fields and could not be confined to a specific node pool — e.g. a dedicated spot pool for cost isolation. The control-plane pods can already be pinned via the Helm chart (nodeSelector/tolerations values); this closes the gap for the sandbox runtime pods, which are the heavy/bursty workload.

How

  • AgentSandboxConfig gains node_selector: BTreeMap<String,String> and tolerations: Vec<Value>.
  • They are inserted into the pod spec via the same insert_optional path already used for imagePullSecrets, so empty = field omitted (rendered pod unchanged when unset).
  • Parsed from the new args in TryFrom<&SandboxArgs>; invalid toleration JSON fails fast with UnsupportedConfig.

Wiring (no chart change needed)

These are plain env vars, so they can be set through the existing apiRs.extraEnv Helm value — no template change required:

apiRs:
  extraEnv:
    SESSION_SANDBOX_NODE_SELECTOR: "centaur-pool=true"
    SESSION_SANDBOX_TOLERATIONS: '[{"key":"centaur","operator":"Equal","value":"true","effect":"NoSchedule"}]'

Tests

  • agent_k8s_config_converts_from_sandbox_args extended: parses key=value selector pairs (trimmed) and the tolerations JSON.
  • node_selector_and_tolerations_land_on_the_pod: asserts both land on the built Sandbox pod template, and are absent by default.

Companion to the chart-side scheduling PR (control-plane nodeSelector/tolerations).

🤖 Generated with Claude Code

mo4islona and others added 2 commits June 29, 2026 12:14
…OLERATIONS

api-rs inlines the sandbox pod template when it creates the agents.x-k8s.io
Sandbox, so the pod previously had no nodeSelector/tolerations and could not be
confined to a specific node pool (e.g. a cost-saving spot pool).

Add two optional knobs, mirroring the existing SESSION_SANDBOX_* env family:
  - SESSION_SANDBOX_NODE_SELECTOR  key=value[,key=value...] -> pod nodeSelector
  - SESSION_SANDBOX_TOLERATIONS    JSON array of toleration objects -> pod tolerations

Both default to empty, so the rendered pod is unchanged when unset. Wired through
AgentSandboxConfig into the pod spec via the same insert_optional path used for
imagePullSecrets. Unit tests cover arg parsing and that the fields land on (and
stay absent from) the built Sandbox pod template.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lection

The per-sandbox iron-proxy pod was built with no nodeSelector/tolerations, so
with sandboxes pinned to a tainted pool the proxy pods spilled onto general
nodes. Carry the sandbox node_selector/tolerations into IronProxyConfig and set
them on the proxy PodSpec, so the proxy follows its sandbox onto the same pool.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant