-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
Feature hasn't been suggested before.
- I have verified this feature I'm about to request hasn't been suggested before.
Describe the enhancement you want to request
I've recently started utilizing opencode, and saw my token usage was somewhat higher with the same general workflow in opencode that I was using in claude code. After doing a little research, I determined the cache model and prompt structure being used was suboptimal for claude-based models.
I've submitted a PR that attempts to address this, and allows configuration at the provider and per-agent level. Some of my workstreams may run for long periods of time, and have gaps inbetween runs for certain types of agents (for example, review agents may not run frequently, but when they do have a large amount of static context used for their instructions), so allowing TTL to be overridden at the agent level made sense to me.
I did some basic testing with the patch, and it made a significant difference in non-cached vs. cached usage, which with claude pricing, can make a huge difference in the cost of using these LLMs. Unfortunately, the minimum cache size wasn't available programmatically, so I had to build a lookup table for the various models. Basic performance/cache testing is in the PR.
I tried to create the PR in a way that wouldn't negatively impact any other models/providers, but could also be used as a starting point for other models/providers that had specific cache implementation requirements. Later this model can be extended to configuration beyond caching.