[NA][SDK] Optimizer MultiAgent Class and Introduce BM25 wiki17_abstracts #4198

vincentkoc · 2025-11-24T07:36:09Z

Details

This PR provides first fully native multiagent optimization support end-to-end designed for HotpotQA dataset benchmarks but can be replicated on other datastes.

Introduces new utils/tools/wikipedia.py module with unified search interface supporting API, ColBERT, and BM25 backends
Adds BM25 index building and optimization tooling with Parquet compression (40-50% size reduction) using newly minted dataset https://huggingface.co/datasets/Comet/wikipedia-2017-bm25
Provides a factory for creating sequential agents for benchmarks SequencedOptimizableAgent(OptimizableAgent) and class HotpotMultiHopAgent which inherts this factory to create a multi-agent system.
Adapted the ChatPrompt object to be a dict[ChatPrompt] allowing for nested "bundle" of prompts used by Agents

Change checklist

User facing
Documentation update

Issues

Resolves #
OPIK-0000

Testing

Locally verified

Documentation

TBD

Screenshots

Co-authored-by: Copilot <[email protected]>

Copilot

Pull request overview

Copilot reviewed 37 out of 39 changed files in this pull request and generated 6 comments.

sdks/opik_optimizer/src/opik_optimizer/utils/tools/wikipedia.py

.../opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/meta_prompt_optimizer.py

sdks/opik_optimizer/tests/integration/datasets/test_dataset_sources.py

Co-authored-by: Copilot <[email protected]>

Copilot

Pull request overview

Copilot reviewed 40 out of 45 changed files in this pull request and generated 7 comments.

sdks/opik_optimizer/src/opik_optimizer/utils/tools/wikipedia.py

sdks/opik_optimizer/src/opik_optimizer/utils/llm_logger.py

sdks/opik_optimizer/scripts/datasets/build_bm25_wikipedia.py

sdks/opik_optimizer/benchmarks/agents/sequenced_agent.py

Copilot · 2025-11-24T23:57:11Z

sdks/opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/ops/candidate_ops.py

+                # NOTE: Some models return fenced JSON blocks or prepend/append prose.
+                # Collect likely JSON snippets and try them in order of size.
+                candidates: list[str] = []
+
+                # Try fenced ```json ... ``` blocks


The regex pattern r\"```json\\s*(\\{.*?\\})\\s*```\" is non-greedy (.*?) which will only match the smallest JSON object within fenced blocks. For complex nested JSON, this could fail to capture the entire structure. Consider using a greedy pattern or a proper brace-counting approach.

@jverre will get you to eyeball issue is i get three-backtick with json on the responses so trying to be lax on response struct.

.../opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/meta_prompt_optimizer.py

sdks/opik_optimizer/scripts/datasets/build_bm25_wikipedia.py

Co-authored-by: Copilot <[email protected]>

…optimizer/meta_prompt_optimizer.py Co-authored-by: Copilot <[email protected]>

…l/opik into vk/optimizer-bm25-hotpot * 'vk/optimizer-bm25-hotpot' of https://github.com/comet-ml/opik: [NA] [BE] Performant ClickHouse bulk update queries to ensure latest row selection (#4206) [issue-1239] [DOCS] Improve console logging level documentation (#4201) [OPIK-3110] [BE/FE] Add threshold support for trace and thread feedback score alerts (#4168) [NA] [DOCS] Fix Update experiment docs (#4199) [issue-2572] [FE] Update experiment name metadata in UI (#3136) Bump opentelmetry.version from 2.21.0 to 2.22.0 in /apps/opik-backend (#4194) [NA][SDK] Optimizer Benchmarks Modal Timeout (#4197) Bump org.apache.maven.plugins:maven-jar-plugin in /apps/opik-backend (#4190) [OPIK-3116][FE] user friendly error message for duplicate dataset name (#4188) [NA][BE] Cursor rules update for BE java code (#4155) Bump software.amazon.awssdk:bom in /apps/opik-backend (#4192) Bump com.diffplug.spotless:spotless-maven-plugin in /apps/opik-backend (#4193) [NA] [BE] Update model prices file (#4189)

…l/opik into vk/optimizer-bm25-hotpot * 'vk/optimizer-bm25-hotpot' of https://github.com/comet-ml/opik: Update hotpot_multihop_benchmark.py

.../opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/meta_prompt_optimizer.py

jverre · 2025-11-25T22:34:03Z

.../opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/meta_prompt_optimizer.py

+                name: prompt.get_messages() for name, prompt in best_prompt.items()
+            }
+            first_prompt = next(iter(best_prompt.values()))
+            best_prompt_messages = first_prompt.get_messages()


We have quite a bit of logic around getting the first prompt out of the bundle and using that, feels risky

jverre · 2025-11-25T22:37:34Z

.../opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/meta_prompt_optimizer.py

+            return " ".join(parts).strip()
+
+        # Parallel-friendly task wrapper so we can use task_evaluator with num_threads.
+        def _evaluated_task(dataset_item: dict[str, Any]) -> dict[str, Any]:


I think this logic might be overcomplicating things, could we not simply update the Optimizable agent class to receive the chat prompt or chat prompt bundle depending on what was passed to the optimize prompt method ? That way we keep everything contained to a custom agent class passed in optimize_prompt.

jverre · 2025-11-25T22:40:44Z

.../opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/meta_prompt_optimizer.py

+            trace = run_result.get("trace") or {"system": _bundle_system_context()}
+            return {"llm_output": final_output_str, "trace": trace}
+
+        # Parallel evaluation with trace preservation; falls back to sequential if needed.


Not sure why we need this, if we need to slow down the evaluations we can just reduce the thread parameter in evaluate method.

As we move to cost and latency optimization, being able to use the trace collection methods is going to be very useful. If the issue is orphan spans and traces we should first try to move to the internal litellm integration

jverre · 2025-11-25T22:43:03Z

sdks/opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/ops/candidate_ops.py

+    return "\n\n".join(blocks)
+
+
 def generate_candidate_prompts(


To simplify all this, why not convert the single chat prompt optimization to an optimization with just one prompt on the bundle ? Public API stays the same but it would avoid a lot of duplication and if else statements throughout the code

jverre · 2025-11-25T22:45:10Z

sdks/opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/prompts.py

+        }"""
+            )
+            if mode == "bundle"
+            else """


Similar to able, all this is removed if we just convert single chat prompt optimization to a chat bundle or some 1

sdks/opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/prompts.py

vincentkoc added 9 commits November 23, 2025 22:56

Create __init__.py

febd61d

Create wikipedia.py

eb3e170

Update core.py

78946ef

Update wikipedia.py

f6a0419

Update wikipedia.py

fa40f6d

tests: wikipedia

7c724e0

chore: examples fixed

c81caaf

Update test_wikipedia.py

f4f8d4e

Update test_wikipedia.py

89ecb18

github-actions bot assigned vincentkoc Nov 24, 2025

vincentkoc added 8 commits November 23, 2025 23:37

chore: lint

19c6fc3

Update pyproject.toml

15c234e

Update pyproject.toml

6042438

Update wikipedia.py

1d3e14e

chore: multi-hop agent

a5fced0

Create hotpot_multihop_benchmark.py

735ae66

Create build_bm25_wikipedia.py

607355b

Update litellm_gepa_tiny_test_example.py

01caf72

vincentkoc marked this pull request as ready for review November 24, 2025 09:08

vincentkoc requested review from a team and dsblank as code owners November 24, 2025 09:08

Copilot AI review requested due to automatic review settings November 24, 2025 09:08

vincentkoc added the DO NOT MERGE label Nov 24, 2025

chore: lint

09be296

This comment was marked as outdated.

Sign in to view

vincentkoc and others added 5 commits November 24, 2025 01:18

Update wikipedia.py

527244d

Update sdks/opik_optimizer/src/opik_optimizer/utils/tools/wikipedia.py

7d62d6b

Co-authored-by: Copilot <[email protected]>

Update sdks/opik_optimizer/scripts/datasets/build_bm25_wikipedia.py

c41166a

Co-authored-by: Copilot <[email protected]>

Update sdks/opik_optimizer/benchmarks/agents/hotpot_multihop_agent.py

ebe06ab

Co-authored-by: Copilot <[email protected]>

Update sdks/opik_optimizer/scripts/llm_frameworks/adk/adk_agent.py

5dbd376

Co-authored-by: Copilot <[email protected]>

Copilot AI reviewed Nov 24, 2025

View reviewed changes

vincentkoc and others added 6 commits November 24, 2025 14:25

Update sdks/opik_optimizer/src/opik_optimizer/utils/tools/wikipedia.py

da1bb9e

Co-authored-by: Copilot <[email protected]>

Update sdks/opik_optimizer/src/opik_optimizer/utils/tools/wikipedia.py

ace0d71

Co-authored-by: Copilot <[email protected]>

Update wikipedia.py

f74587c

fix: refactored multiagent

834dd87

fix: refactor finalized

c9c4be1

Update sequenced_agent.py

19f9593

vincentkoc requested a review from Copilot November 24, 2025 23:55

Copilot AI reviewed Nov 24, 2025

View reviewed changes

vincentkoc and others added 17 commits November 24, 2025 16:01

Update sdks/opik_optimizer/src/opik_optimizer/utils/tools/wikipedia.py

b8ef257

Co-authored-by: Copilot <[email protected]>

Update sdks/opik_optimizer/src/opik_optimizer/utils/llm_logger.py

bbea8ab

Co-authored-by: Copilot <[email protected]>

Update sdks/opik_optimizer/benchmarks/agents/sequenced_agent.py

d7b504a

Co-authored-by: Copilot <[email protected]>

Update sdks/opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_…

86bbd1a

…optimizer/meta_prompt_optimizer.py Co-authored-by: Copilot <[email protected]>

fix: tests

c6dc478

Update test_wikipedia.py

b4b3d44

Update meta_prompt_optimizer.py

476dc1e

Merge branch 'main' into vk/optimizer-bm25-hotpot

5746a58

fix: move files

6e9c59d

Create __init__.py

65d40c3

Update result_ops.py

78ce07d

Update hotpot_multihop_benchmark.py

9816407

chore: mv unit test

147fa1b

Merge branch 'vk/optimizer-bm25-hotpot' of https://github.com/comet-m…

dcd0f6f

…l/opik into vk/optimizer-bm25-hotpot * 'vk/optimizer-bm25-hotpot' of https://github.com/comet-ml/opik: Update hotpot_multihop_benchmark.py

Merge branch 'main' into vk/optimizer-bm25-hotpot

3ce5710

Merge branch 'main' into vk/optimizer-bm25-hotpot

b80aa3f

jverre requested changes Nov 25, 2025

View reviewed changes

vincentkoc added 3 commits November 25, 2025 22:58

Update prompts.py

c15d631

Update meta_prompt_optimizer.py

e063fb9

Update hotpot_multihop_agent.py

7cc3b54

[NA][SDK] Optimizer MultiAgent Class and Introduce BM25 wiki17_abstracts #4198

Are you sure you want to change the base?

[NA][SDK] Optimizer MultiAgent Class and Introduce BM25 wiki17_abstracts #4198

Uh oh!

Conversation

vincentkoc commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Change checklist

Issues

Testing

Documentation

Screenshots

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

vincentkoc Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jverre Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

jverre Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

jverre Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

jverre Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

jverre Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vincentkoc commented Nov 24, 2025 •

edited

Loading