-
Notifications
You must be signed in to change notification settings - Fork 3.6k
feat(provider): add provider-specific cache configuration system (significant token usage reduction) #5422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
ce7acc2 to
7b1f516
Compare
Sewer56
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a second pair of eyes but 👍 from me
|
Please merge this ❤️ I'm deciding whether we should use opencode in our company and I'm actually surprised this wasn't implemented up until now. |
|
I tested with providers I have access to. I can confirm it works with with Anthropic (Opus 4.5), Google (Gemini 3 Pro preview), and Mistral (Devstral 2). |
Implements comprehensive ProviderConfig system for provider-specific caching and prompt optimization. Addresses cache configuration needs from sst#5416. - Add ProviderConfig namespace with defaults for 19+ providers - Support three caching paradigms: explicit-breakpoint, automatic-prefix, implicit - Add tool sorting and caching for cache consistency - Add user config overrides via opencode.json (provider and agent level) - Simplify system message handling with combineSystemMessages boolean
7b1f516 to
d213e39
Compare
|
Updated one of the comments to be more clear, but no functional changes. |
|
hey thanks for working on this - i'm actually currently mid refactor on a bunch of code where we call LLMs so we probably can't explicitly accept this PR (llm-centralization branch) also i don't know if we want to go as far as making all this deeply configurable yet could you lay out all the things you improved on top of what we have currently? then i can make sure those get included |
|
i can also just read the PR/have opencode summarize haha. i'll do a pass once that llm-centralization branch is merged |
Ok, that's fine. I added comments throughout that should help your opencode summarization, but I'm happy to discuss if you need more clarity. I'm on discord if you need more input. I was initially going to refactor all of your provider handling, but it sounds like you're already doing this - I designed this PR to be a stepping stone towards that since I didn't want to make my very first PR be a massive rewrite of lots of things. If you just want to take my work and fold it into your work, I understand, at the end of the day it's of huge benefit to the users, so I'll be happy. Thank you for taking the time to evaluate things and hope to see your rewrite soon! I can also just update this once you're done/merged in, just let me know what works best. |
|
Hey, thanks for this PR. I started playing around with it. Can you do me a favor and add the following patch to visualize token cache statistics on the sidebar? FYI: I am not 100% confident that this is the best way of calculating the percentage Should this be shown in the sidebar?: I think it should, not only will it help us to spot issues, but also let users notice models that do not support caching diff --git a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
index c1c29a73..afaaad73 100644
--- a/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
+++ b/packages/opencode/src/cli/cmd/tui/routes/session/sidebar.tsx
@@ -42,9 +42,20 @@ export function Sidebar(props: { sessionID: string }) {
const total =
last.tokens.input + last.tokens.output + last.tokens.reasoning + last.tokens.cache.read + last.tokens.cache.write
const model = sync.data.provider.find((x) => x.id === last.providerID)?.models[last.modelID]
+
+ // Calculate cache hit percentage
+ const cacheHitPercentage = total > 0 ? Math.round((last.tokens.cache.read / total) * 100) : 0
+ const cacheRead = last.tokens.cache.read
+ const cacheWrite = last.tokens.cache.write
+
return {
tokens: total.toLocaleString(),
percentage: model?.limit.context ? Math.round((total / model.limit.context) * 100) : null,
+ cache: {
+ hitPercentage: cacheHitPercentage,
+ read: cacheRead,
+ write: cacheWrite,
+ },
}
})
@@ -81,6 +92,11 @@ export function Sidebar(props: { sessionID: string }) {
</text>
<text fg={theme.textMuted}>{context()?.tokens ?? 0} tokens</text>
<text fg={theme.textMuted}>{context()?.percentage ?? 0}% used</text>
+ <Show when={context()?.cache !== undefined}>
+ <text style={{ fg: context()!.cache.hitPercentage > 0 ? theme.success : theme.textMuted }}>
+ {context()!.cache.hitPercentage}% cached
+ </text>
+ </Show>
<text fg={theme.textMuted}>{cost()} spent</text>
</box>
<Show when={mcpEntries().length > 0}>
(END)Maybe the correct way would be to do something like this instead 🤔 Also, I think the whole ripgrep tree is introducing cache misses on file additions/deletions. Probably, it should be set on a per-session basis and not update automatically |
I'd be happy to do this except I was informed by @thdxr they are reworking this application, so I assume they intend to take these ideas and just merge it into whatever they're working on. I'm leaving this open until I hear otherwise, but I'm not sure it makes sense for me to do more implementation at this point with @thdxr 's input. As far as doing cache statistics, I have a number of ideas that would work, but I don't want to invest time into it to have it closed out unmerged. |
|
Given that llm-centralization branch has been merged; is there any hope of bringing the improvements from here onto mainline? @thdxr Seems the person opening the PR is waiting for feedback; and until there is any- it seems they're going to be left in the dark. |
Summary
Implements comprehensive ProviderConfig system for provider-specific caching and prompt optimization.
Closes #5416
Test Results with Claude Opus 4.5 (my primary target for improvement)
A/B testing with identical prompts (same session, same agent) comparing legacy vs optimized behavior:
The slightly larger initial cache write (+11%) is quickly amortized by dramatically fewer cache invalidations in subsequent requests.
Provider testing
I tested with providers I have access to. I can confirm it works with with Anthropic (Opus 4.5), Google (Gemini 3 Pro preview), and Mistral (Devstral 2).
Changes
ProviderConfignamespace with defaults for 19+ providersopencode.json(provider and agent level)combineSystemMessagesbooleanConfig Priority
Provider defaults → User provider config → User agent config
Area for future optimization
Currently, models.dev doesn't provide information regarding minimum cache requirements or prompt requirements, so this had to be written out as configuration. It would be ideal if the model definitions were updated with this detail. Until that point, as providers/models are added or updated, for optimal performance the configuration should be updated to match.
New Files
src/provider/config.ts(874 lines)test/provider/config.test.ts(215 tests)Example Config
{ "provider": { "anthropic": { "cache": { "enabled": true, "ttl": "1h", "minTokens": 2048 } } }, "agent": { "plan": { "cache": { "ttl": "1h" } } } }