Skip to content

Conversation

@ormandj
Copy link

@ormandj ormandj commented Dec 12, 2025

Summary

Implements comprehensive ProviderConfig system for provider-specific caching and prompt optimization.

Closes #5416

Test Results with Claude Opus 4.5 (my primary target for improvement)

A/B testing with identical prompts (same session, same agent) comparing legacy vs optimized behavior:

Metric Legacy Optimized Improvement
Cache writes (post-warmup) 18,417 tokens ~10,340 tokens 44% reduction
Effective cost (3rd prompt) 13,021 tokens 3,495 tokens 73% reduction
Initial cache write 16,211 tokens 17,987 tokens +11% (expected)
Cache hit rate 100% 100% Same

The slightly larger initial cache write (+11%) is quickly amortized by dramatically fewer cache invalidations in subsequent requests.

Provider testing

I tested with providers I have access to. I can confirm it works with with Anthropic (Opus 4.5), Google (Gemini 3 Pro preview), and Mistral (Devstral 2).

Changes

  • Add ProviderConfig namespace with defaults for 19+ providers
  • Support three caching paradigms:
    • Explicit breakpoint (Anthropic, Bedrock, Google Vertex Anthropic)
    • Automatic prefix (OpenAI, Azure, GitHub Copilot, DeepSeek)
    • Implicit/content-based (Google/Gemini)
  • Add tool sorting for cache consistency across requests
  • Add tool caching for explicit breakpoint providers
  • Add user config overrides via opencode.json (provider and agent level)
  • Simplify system message handling with combineSystemMessages boolean

Config Priority

Provider defaults → User provider config → User agent config

Area for future optimization

Currently, models.dev doesn't provide information regarding minimum cache requirements or prompt requirements, so this had to be written out as configuration. It would be ideal if the model definitions were updated with this detail. Until that point, as providers/models are added or updated, for optimal performance the configuration should be updated to match.

New Files

  • src/provider/config.ts (874 lines)
  • test/provider/config.test.ts (215 tests)

Example Config

{
  "provider": {
    "anthropic": {
      "cache": {
        "enabled": true,
        "ttl": "1h",
        "minTokens": 2048
      }
    }
  },
  "agent": {
    "plan": {
      "cache": {
        "ttl": "1h"
      }
    }
  }
}

@ormandj ormandj force-pushed the provider-cache-optimization branch 2 times, most recently from ce7acc2 to 7b1f516 Compare December 12, 2025 06:28
@ormandj ormandj changed the title feat(provider): add provider-specific cache configuration system feat(provider): add provider-specific cache configuration system (significant token usage reduction) Dec 12, 2025
Copy link

@Sewer56 Sewer56 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a second pair of eyes but 👍 from me

@yamiteru
Copy link

Please merge this ❤️

I'm deciding whether we should use opencode in our company and I'm actually surprised this wasn't implemented up until now.

@ormandj
Copy link
Author

ormandj commented Dec 12, 2025

I tested with providers I have access to. I can confirm it works with with Anthropic (Opus 4.5), Google (Gemini 3 Pro preview), and Mistral (Devstral 2).

Implements comprehensive ProviderConfig system for provider-specific caching
and prompt optimization. Addresses cache configuration needs from sst#5416.

- Add ProviderConfig namespace with defaults for 19+ providers
- Support three caching paradigms: explicit-breakpoint, automatic-prefix, implicit
- Add tool sorting and caching for cache consistency
- Add user config overrides via opencode.json (provider and agent level)
- Simplify system message handling with combineSystemMessages boolean
@ormandj ormandj force-pushed the provider-cache-optimization branch from 7b1f516 to d213e39 Compare December 12, 2025 16:31
@ormandj
Copy link
Author

ormandj commented Dec 12, 2025

Updated one of the comments to be more clear, but no functional changes.

@thdxr
Copy link
Contributor

thdxr commented Dec 12, 2025

hey thanks for working on this - i'm actually currently mid refactor on a bunch of code where we call LLMs so we probably can't explicitly accept this PR (llm-centralization branch)

also i don't know if we want to go as far as making all this deeply configurable yet

could you lay out all the things you improved on top of what we have currently? then i can make sure those get included

@thdxr
Copy link
Contributor

thdxr commented Dec 12, 2025

i can also just read the PR/have opencode summarize haha. i'll do a pass once that llm-centralization branch is merged

@ormandj
Copy link
Author

ormandj commented Dec 12, 2025

i can also just read the PR/have opencode summarize haha. i'll do a pass once that llm-centralization branch is merged

Ok, that's fine. I added comments throughout that should help your opencode summarization, but I'm happy to discuss if you need more clarity. I'm on discord if you need more input. I was initially going to refactor all of your provider handling, but it sounds like you're already doing this - I designed this PR to be a stepping stone towards that since I didn't want to make my very first PR be a massive rewrite of lots of things.

If you just want to take my work and fold it into your work, I understand, at the end of the day it's of huge benefit to the users, so I'll be happy. Thank you for taking the time to evaluate things and hope to see your rewrite soon! I can also just update this once you're done/merged in, just let me know what works best.

@gytis-ivaskevicius
Copy link

gytis-ivaskevicius commented Dec 15, 2025

Hey, thanks for this PR. I started playing around with it. Can you do me a favor and add the following patch to visualize token cache statistics on the sidebar? FYI: I am not 100% confident that this is the best way of calculating the percentage

Should this be shown in the sidebar?: I think it should, not only will it help us to spot issues, but also let users notice models that do not support caching

diff --git a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
index c1c29a73..afaaad73 100644
--- a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
+++ b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
@@ -42,9 +42,20 @@ export function Sidebar(props: { sessionID: string }) {
     const total =
       last.tokens.input + last.tokens.output + last.tokens.reasoning + last.tokens.cache.read + last.tokens.cache.write
     const model = sync.data.provider.find((x) => x.id === last.providerID)?.models[last.modelID]
+
+    // Calculate cache hit percentage
+    const cacheHitPercentage = total > 0 ? Math.round((last.tokens.cache.read / total) * 100) : 0
+    const cacheRead = last.tokens.cache.read
+    const cacheWrite = last.tokens.cache.write
+
     return {
       tokens: total.toLocaleString(),
       percentage: model?.limit.context ? Math.round((total / model.limit.context) * 100) : null,
+      cache: {
+        hitPercentage: cacheHitPercentage,
+        read: cacheRead,
+        write: cacheWrite,
+      },
     }
   })

@@ -81,6 +92,11 @@ export function Sidebar(props: { sessionID: string }) {
               </text>
               <text fg={theme.textMuted}>{context()?.tokens ?? 0} tokens</text>
               <text fg={theme.textMuted}>{context()?.percentage ?? 0}% used</text>
+              <Show when={context()?.cache !== undefined}>
+                <text style={{ fg: context()!.cache.hitPercentage > 0 ? theme.success : theme.textMuted }}>
+                  {context()!.cache.hitPercentage}% cached
+                </text>
+              </Show>
               <text fg={theme.textMuted}>{cost()} spent</text>
             </box>
             <Show when={mcpEntries().length > 0}>
(END)

Maybe the correct way would be to do something like this instead 🤔

  const totalInput = last.tokens.input + last.tokens.cache.read + last.tokens.cache.write
  const cacheHitPercentage = totalInput > 0 ? Math.round((last.tokens.cache.read / totalInput) * 100) : 0

Also, I think the whole ripgrep tree is introducing cache misses on file additions/deletions. Probably, it should be set on a per-session basis and not update automatically

@ormandj
Copy link
Author

ormandj commented Dec 15, 2025

Hey, thanks for this PR. I started playing around with it. Can you do me a favor and add the following patch to visualize token cache statistics on the sidebar? FYI: I am not 100% confident that this is the best way of calculating the percentage

Should this be shown in the sidebar?: I think it should, not only will it help us to spot issues, but also let users notice models that do not support caching

diff --git a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
index c1c29a73..afaaad73 100644
--- a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
+++ b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
@@ -42,9 +42,20 @@ export function Sidebar(props: { sessionID: string }) {
     const total =
       last.tokens.input + last.tokens.output + last.tokens.reasoning + last.tokens.cache.read + last.tokens.cache.write
     const model = sync.data.provider.find((x) => x.id === last.providerID)?.models[last.modelID]
+
+    // Calculate cache hit percentage
+    const cacheHitPercentage = total > 0 ? Math.round((last.tokens.cache.read / total) * 100) : 0
+    const cacheRead = last.tokens.cache.read
+    const cacheWrite = last.tokens.cache.write
+
     return {
       tokens: total.toLocaleString(),
       percentage: model?.limit.context ? Math.round((total / model.limit.context) * 100) : null,
+      cache: {
+        hitPercentage: cacheHitPercentage,
+        read: cacheRead,
+        write: cacheWrite,
+      },
     }
   })

@@ -81,6 +92,11 @@ export function Sidebar(props: { sessionID: string }) {
               </text>
               <text fg={theme.textMuted}>{context()?.tokens ?? 0} tokens</text>
               <text fg={theme.textMuted}>{context()?.percentage ?? 0}% used</text>
+              <Show when={context()?.cache !== undefined}>
+                <text style={{ fg: context()!.cache.hitPercentage > 0 ? theme.success : theme.textMuted }}>
+                  {context()!.cache.hitPercentage}% cached
+                </text>
+              </Show>
               <text fg={theme.textMuted}>{cost()} spent</text>
             </box>
             <Show when={mcpEntries().length > 0}>
(END)

Maybe the correct way would be to do something like this instead 🤔

  const totalInput = last.tokens.input + last.tokens.cache.read + last.tokens.cache.write
  const cacheHitPercentage = totalInput > 0 ? Math.round((last.tokens.cache.read / totalInput) * 100) : 0

Also, I think the whole ripgrep tree is introducing cache misses on file additions/deletions. Probably, there should be a toggle to get rid of it

I'd be happy to do this except I was informed by @thdxr they are reworking this application, so I assume they intend to take these ideas and just merge it into whatever they're working on. I'm leaving this open until I hear otherwise, but I'm not sure it makes sense for me to do more implementation at this point with @thdxr 's input. As far as doing cache statistics, I have a number of ideas that would work, but I don't want to invest time into it to have it closed out unmerged.

@Sewer56
Copy link

Sewer56 commented Dec 16, 2025

Given that llm-centralization branch has been merged; is there any hope of bringing the improvements from here onto mainline? @thdxr

Seems the person opening the PR is waiting for feedback; and until there is any- it seems they're going to be left in the dark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Anthropic (and others) caching improvement

5 participants