feat(provider): add provider-specific cache configuration system (significant token usage reduction) #5422

ormandj · 2025-12-12T06:21:37Z

Summary

Implements comprehensive ProviderConfig system for provider-specific caching and prompt optimization.

Test Results with Claude Opus 4.5 (my primary target for improvement)

A/B testing with identical prompts (same session, same agent) comparing legacy vs optimized behavior:

Metric	Legacy	Optimized	Improvement
Cache writes (post-warmup)	18,417 tokens	~10,340 tokens	44% reduction
Effective cost (3rd prompt)	13,021 tokens	3,495 tokens	73% reduction
Initial cache write	16,211 tokens	17,987 tokens	+11% (expected)
Cache hit rate	100%	100%	Same

The slightly larger initial cache write (+11%) is quickly amortized by dramatically fewer cache invalidations in subsequent requests.

Provider testing

I tested with providers I have access to. I can confirm it works with with Anthropic (Opus 4.5), Google (Gemini 3 Pro preview), and Mistral (Devstral 2).

Changes

Add ProviderConfig namespace with defaults for 19+ providers
Support three caching paradigms:
- Explicit breakpoint (Anthropic, Bedrock, Google Vertex Anthropic)
- Automatic prefix (OpenAI, Azure, GitHub Copilot, DeepSeek)
- Implicit/content-based (Google/Gemini)
Add tool sorting for cache consistency across requests
Add tool caching for explicit breakpoint providers
Add user config overrides via opencode.json (provider and agent level)
Simplify system message handling with combineSystemMessages boolean

Config Priority

Provider defaults → User provider config → User agent config

Area for future optimization

Currently, models.dev doesn't provide information regarding minimum cache requirements or prompt requirements, so this had to be written out as configuration. It would be ideal if the model definitions were updated with this detail. Until that point, as providers/models are added or updated, for optimal performance the configuration should be updated to match.

New Files

src/provider/config.ts (874 lines)
test/provider/config.test.ts (215 tests)

Example Config

{
  "provider": {
    "anthropic": {
      "cache": {
        "enabled": true,
        "ttl": "1h",
        "minTokens": 2048
      }
    }
  },
  "agent": {
    "plan": {
      "cache": {
        "ttl": "1h"
      }
    }
  }
}

Sewer56

Needs a second pair of eyes but 👍 from me

yamiteru · 2025-12-12T15:50:29Z

Please merge this ❤️

I'm deciding whether we should use opencode in our company and I'm actually surprised this wasn't implemented up until now.

ormandj · 2025-12-12T16:09:17Z

I tested with providers I have access to. I can confirm it works with with Anthropic (Opus 4.5), Google (Gemini 3 Pro preview), and Mistral (Devstral 2).

Implements comprehensive ProviderConfig system for provider-specific caching and prompt optimization. Addresses cache configuration needs from sst#5416. - Add ProviderConfig namespace with defaults for 19+ providers - Support three caching paradigms: explicit-breakpoint, automatic-prefix, implicit - Add tool sorting and caching for cache consistency - Add user config overrides via opencode.json (provider and agent level) - Simplify system message handling with combineSystemMessages boolean

ormandj · 2025-12-12T16:32:06Z

Updated one of the comments to be more clear, but no functional changes.

thdxr · 2025-12-12T20:08:21Z

hey thanks for working on this - i'm actually currently mid refactor on a bunch of code where we call LLMs so we probably can't explicitly accept this PR (llm-centralization branch)

also i don't know if we want to go as far as making all this deeply configurable yet

could you lay out all the things you improved on top of what we have currently? then i can make sure those get included

thdxr · 2025-12-12T20:13:02Z

i can also just read the PR/have opencode summarize haha. i'll do a pass once that llm-centralization branch is merged

ormandj · 2025-12-12T20:18:51Z

i can also just read the PR/have opencode summarize haha. i'll do a pass once that llm-centralization branch is merged

Ok, that's fine. I added comments throughout that should help your opencode summarization, but I'm happy to discuss if you need more clarity. I'm on discord if you need more input. I was initially going to refactor all of your provider handling, but it sounds like you're already doing this - I designed this PR to be a stepping stone towards that since I didn't want to make my very first PR be a massive rewrite of lots of things.

If you just want to take my work and fold it into your work, I understand, at the end of the day it's of huge benefit to the users, so I'll be happy. Thank you for taking the time to evaluate things and hope to see your rewrite soon! I can also just update this once you're done/merged in, just let me know what works best.

gytis-ivaskevicius · 2025-12-15T01:23:02Z

Hey, thanks for this PR. I started playing around with it. Can you do me a favor and add the following patch to visualize token cache statistics on the sidebar? FYI: I am not 100% confident that this is the best way of calculating the percentage

Should this be shown in the sidebar?: I think it should, not only will it help us to spot issues, but also let users notice models that do not support caching

diff --git a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
index c1c29a73..afaaad73 100644
--- a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
+++ b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
@@ -42,9 +42,20 @@ export function Sidebar(props: { sessionID: string }) {
     const total =
       last.tokens.input + last.tokens.output + last.tokens.reasoning + last.tokens.cache.read + last.tokens.cache.write
     const model = sync.data.provider.find((x) => x.id === last.providerID)?.models[last.modelID]
+
+    // Calculate cache hit percentage
+    const cacheHitPercentage = total > 0 ? Math.round((last.tokens.cache.read / total) * 100) : 0
+    const cacheRead = last.tokens.cache.read
+    const cacheWrite = last.tokens.cache.write
+
     return {
       tokens: total.toLocaleString(),
       percentage: model?.limit.context ? Math.round((total / model.limit.context) * 100) : null,
+      cache: {
+        hitPercentage: cacheHitPercentage,
+        read: cacheRead,
+        write: cacheWrite,
+      },
     }
   })

@@ -81,6 +92,11 @@ export function Sidebar(props: { sessionID: string }) {
               </text>
               <text fg={theme.textMuted}>{context()?.tokens ?? 0} tokens</text>
               <text fg={theme.textMuted}>{context()?.percentage ?? 0}% used</text>
+              <Show when={context()?.cache !== undefined}>
+                <text style={{ fg: context()!.cache.hitPercentage > 0 ? theme.success : theme.textMuted }}>
+                  {context()!.cache.hitPercentage}% cached
+                </text>
+              </Show>
               <text fg={theme.textMuted}>{cost()} spent</text>
             </box>
             <Show when={mcpEntries().length > 0}>
(END)

Maybe the correct way would be to do something like this instead 🤔

  const totalInput = last.tokens.input + last.tokens.cache.read + last.tokens.cache.write
  const cacheHitPercentage = totalInput > 0 ? Math.round((last.tokens.cache.read / totalInput) * 100) : 0

Also, I think the whole ripgrep tree is introducing cache misses on file additions/deletions. Probably, it should be set on a per-session basis and not update automatically

ormandj · 2025-12-15T02:35:54Z

Hey, thanks for this PR. I started playing around with it. Can you do me a favor and add the following patch to visualize token cache statistics on the sidebar? FYI: I am not 100% confident that this is the best way of calculating the percentage

Should this be shown in the sidebar?: I think it should, not only will it help us to spot issues, but also let users notice models that do not support caching

diff --git a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
index c1c29a73..afaaad73 100644
--- a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
+++ b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
@@ -42,9 +42,20 @@ export function Sidebar(props: { sessionID: string }) {
     const total =
       last.tokens.input + last.tokens.output + last.tokens.reasoning + last.tokens.cache.read + last.tokens.cache.write
     const model = sync.data.provider.find((x) => x.id === last.providerID)?.models[last.modelID]
+
+    // Calculate cache hit percentage
+    const cacheHitPercentage = total > 0 ? Math.round((last.tokens.cache.read / total) * 100) : 0
+    const cacheRead = last.tokens.cache.read
+    const cacheWrite = last.tokens.cache.write
+
     return {
       tokens: total.toLocaleString(),
       percentage: model?.limit.context ? Math.round((total / model.limit.context) * 100) : null,
+      cache: {
+        hitPercentage: cacheHitPercentage,
+        read: cacheRead,
+        write: cacheWrite,
+      },
     }
   })

@@ -81,6 +92,11 @@ export function Sidebar(props: { sessionID: string }) {
               </text>
               <text fg={theme.textMuted}>{context()?.tokens ?? 0} tokens</text>
               <text fg={theme.textMuted}>{context()?.percentage ?? 0}% used</text>
+              <Show when={context()?.cache !== undefined}>
+                <text style={{ fg: context()!.cache.hitPercentage > 0 ? theme.success : theme.textMuted }}>
+                  {context()!.cache.hitPercentage}% cached
+                </text>
+              </Show>
               <text fg={theme.textMuted}>{cost()} spent</text>
             </box>
             <Show when={mcpEntries().length > 0}>
(END)

Maybe the correct way would be to do something like this instead 🤔

  const totalInput = last.tokens.input + last.tokens.cache.read + last.tokens.cache.write
  const cacheHitPercentage = totalInput > 0 ? Math.round((last.tokens.cache.read / totalInput) * 100) : 0

Also, I think the whole ripgrep tree is introducing cache misses on file additions/deletions. Probably, there should be a toggle to get rid of it

I'd be happy to do this except I was informed by @thdxr they are reworking this application, so I assume they intend to take these ideas and just merge it into whatever they're working on. I'm leaving this open until I hear otherwise, but I'm not sure it makes sense for me to do more implementation at this point with @thdxr 's input. As far as doing cache statistics, I have a number of ideas that would work, but I don't want to invest time into it to have it closed out unmerged.

Sewer56 · 2025-12-16T12:44:17Z

Given that llm-centralization branch has been merged; is there any hope of bringing the improvements from here onto mainline? @thdxr

Seems the person opening the PR is waiting for feedback; and until there is any- it seems they're going to be left in the dark.

ormandj mentioned this pull request Dec 12, 2025

[FEATURE]: Anthropic (and others) caching improvement #5416

Open

1 task

ormandj force-pushed the provider-cache-optimization branch 2 times, most recently from ce7acc2 to 7b1f516 Compare December 12, 2025 06:28

ormandj changed the title ~~feat(provider): add provider-specific cache configuration system~~ feat(provider): add provider-specific cache configuration system (significant token usage reduction) Dec 12, 2025

Sewer56 reviewed Dec 12, 2025

View reviewed changes

ormandj force-pushed the provider-cache-optimization branch from 7b1f516 to d213e39 Compare December 12, 2025 16:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(provider): add provider-specific cache configuration system (significant token usage reduction) #5422

feat(provider): add provider-specific cache configuration system (significant token usage reduction) #5422

Uh oh!

ormandj commented Dec 12, 2025 •

edited

Loading

Uh oh!

Sewer56 left a comment

Uh oh!

yamiteru commented Dec 12, 2025

Uh oh!

ormandj commented Dec 12, 2025 •

edited

Loading

Uh oh!

ormandj commented Dec 12, 2025

Uh oh!

thdxr commented Dec 12, 2025

Uh oh!

thdxr commented Dec 12, 2025

Uh oh!

ormandj commented Dec 12, 2025 •

edited

Loading

Uh oh!

gytis-ivaskevicius commented Dec 15, 2025 •

edited

Loading

Uh oh!

ormandj commented Dec 15, 2025 •

edited

Loading

Uh oh!

Sewer56 commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat(provider): add provider-specific cache configuration system (significant token usage reduction) #5422

Are you sure you want to change the base?

feat(provider): add provider-specific cache configuration system (significant token usage reduction) #5422

Uh oh!

Conversation

ormandj commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Results with Claude Opus 4.5 (my primary target for improvement)

Provider testing

Changes

Config Priority

Area for future optimization

New Files

Example Config

Uh oh!

Sewer56 left a comment

Choose a reason for hiding this comment

Uh oh!

yamiteru commented Dec 12, 2025

Uh oh!

ormandj commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ormandj commented Dec 12, 2025

Uh oh!

thdxr commented Dec 12, 2025

Uh oh!

thdxr commented Dec 12, 2025

Uh oh!

ormandj commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gytis-ivaskevicius commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ormandj commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sewer56 commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ormandj commented Dec 12, 2025 •

edited

Loading

ormandj commented Dec 12, 2025 •

edited

Loading

ormandj commented Dec 12, 2025 •

edited

Loading

gytis-ivaskevicius commented Dec 15, 2025 •

edited

Loading

ormandj commented Dec 15, 2025 •

edited

Loading