-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[NA][SDK] Optimizer MultiAgent Class and Introduce BM25 wiki17_abstracts #4198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 37 out of 39 changed files in this pull request and generated 6 comments.
sdks/opik_optimizer/src/opik_optimizer/utils/tools/wikipedia.py
Outdated
Show resolved
Hide resolved
sdks/opik_optimizer/src/opik_optimizer/utils/tools/wikipedia.py
Outdated
Show resolved
Hide resolved
sdks/opik_optimizer/src/opik_optimizer/utils/tools/wikipedia.py
Outdated
Show resolved
Hide resolved
.../opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/meta_prompt_optimizer.py
Show resolved
Hide resolved
.../opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/meta_prompt_optimizer.py
Show resolved
Hide resolved
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 40 out of 45 changed files in this pull request and generated 7 comments.
sdks/opik_optimizer/src/opik_optimizer/utils/tools/wikipedia.py
Outdated
Show resolved
Hide resolved
| # NOTE: Some models return fenced JSON blocks or prepend/append prose. | ||
| # Collect likely JSON snippets and try them in order of size. | ||
| candidates: list[str] = [] | ||
|
|
||
| # Try fenced ```json ... ``` blocks |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern r\"```json\\s*(\\{.*?\\})\\s*```\" is non-greedy (.*?) which will only match the smallest JSON object within fenced blocks. For complex nested JSON, this could fail to capture the entire structure. Consider using a greedy pattern or a proper brace-counting approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jverre will get you to eyeball issue is i get three-backtick with json on the responses so trying to be lax on response struct.
.../opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/meta_prompt_optimizer.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
…optimizer/meta_prompt_optimizer.py Co-authored-by: Copilot <[email protected]>
…l/opik into vk/optimizer-bm25-hotpot * 'vk/optimizer-bm25-hotpot' of https://github.com/comet-ml/opik: [NA] [BE] Performant ClickHouse bulk update queries to ensure latest row selection (#4206) [issue-1239] [DOCS] Improve console logging level documentation (#4201) [OPIK-3110] [BE/FE] Add threshold support for trace and thread feedback score alerts (#4168) [NA] [DOCS] Fix Update experiment docs (#4199) [issue-2572] [FE] Update experiment name metadata in UI (#3136) Bump opentelmetry.version from 2.21.0 to 2.22.0 in /apps/opik-backend (#4194) [NA][SDK] Optimizer Benchmarks Modal Timeout (#4197) Bump org.apache.maven.plugins:maven-jar-plugin in /apps/opik-backend (#4190) [OPIK-3116][FE] user friendly error message for duplicate dataset name (#4188) [NA][BE] Cursor rules update for BE java code (#4155) Bump software.amazon.awssdk:bom in /apps/opik-backend (#4192) Bump com.diffplug.spotless:spotless-maven-plugin in /apps/opik-backend (#4193) [NA] [BE] Update model prices file (#4189)
…l/opik into vk/optimizer-bm25-hotpot * 'vk/optimizer-bm25-hotpot' of https://github.com/comet-ml/opik: Update hotpot_multihop_benchmark.py
.../opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/meta_prompt_optimizer.py
Outdated
Show resolved
Hide resolved
| name: prompt.get_messages() for name, prompt in best_prompt.items() | ||
| } | ||
| first_prompt = next(iter(best_prompt.values())) | ||
| best_prompt_messages = first_prompt.get_messages() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have quite a bit of logic around getting the first prompt out of the bundle and using that, feels risky
| return " ".join(parts).strip() | ||
|
|
||
| # Parallel-friendly task wrapper so we can use task_evaluator with num_threads. | ||
| def _evaluated_task(dataset_item: dict[str, Any]) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic might be overcomplicating things, could we not simply update the Optimizable agent class to receive the chat prompt or chat prompt bundle depending on what was passed to the optimize prompt method ? That way we keep everything contained to a custom agent class passed in optimize_prompt.
| trace = run_result.get("trace") or {"system": _bundle_system_context()} | ||
| return {"llm_output": final_output_str, "trace": trace} | ||
|
|
||
| # Parallel evaluation with trace preservation; falls back to sequential if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we need this, if we need to slow down the evaluations we can just reduce the thread parameter in evaluate method.
As we move to cost and latency optimization, being able to use the trace collection methods is going to be very useful. If the issue is orphan spans and traces we should first try to move to the internal litellm integration
| return "\n\n".join(blocks) | ||
|
|
||
|
|
||
| def generate_candidate_prompts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify all this, why not convert the single chat prompt optimization to an optimization with just one prompt on the bundle ? Public API stays the same but it would avoid a lot of duplication and if else statements throughout the code
| }""" | ||
| ) | ||
| if mode == "bundle" | ||
| else """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to able, all this is removed if we just convert single chat prompt optimization to a chat bundle or some 1
Details
This PR provides first fully native multiagent optimization support end-to-end designed for
HotpotQAdataset benchmarks but can be replicated on other datastes.utils/tools/wikipedia.pymodule with unified search interface supporting API, ColBERT, and BM25 backendsSequencedOptimizableAgent(OptimizableAgent)and classHotpotMultiHopAgentwhich inherts this factory to create a multi-agent system.ChatPromptobject to be adict[ChatPrompt]allowing for nested "bundle" of prompts used by AgentsChange checklist
Issues
Testing
Locally verified
Documentation
TBD
Screenshots