[NA][SDK] Optimizer centralized shuffle, seed and sampling logic. #4227

vincentkoc · 2025-11-25T21:05:21Z

Details

This PR centralizes dataset sampling, seeding, and shuffling logic across the opik_optimizer SDK to ensure consistent and reproducible evaluation runs. It introduces new utility modules (rng.py and sampling.py) that provide deterministic random number generation and unified sampling plan resolution, replacing ad-hoc implementations scattered across different optimizers.

Key changes include:

New utils/rng.py module providing deterministic RNG helpers with seed derivation and batching
New utils/sampling.py module for consistent dataset sampling plan resolution
Base optimizer enhancements to support centralized sampling and RNG derivation
Updates to all optimizer algorithms to use the new centralized utilities
Provides refactoring to have future support for [OPIK-3252] [P SDK] Alexkuzmik/refactor evaluation engine and implement evaluate_on_dict_items function #4209 (comment)

Change checklist

User facing
Documentation update

Issues

Resolves #
OPIK-0000

Testing

Local

Documentation

tbd

Copilot

Pull request overview

This PR centralizes dataset sampling, seeding, and shuffling logic across the opik_optimizer SDK to ensure consistent and reproducible evaluation runs. It introduces new utility modules (rng.py and sampling.py) that provide deterministic random number generation and unified sampling plan resolution, replacing ad-hoc implementations scattered across different optimizers.

Key changes include:

New utils/rng.py module providing deterministic RNG helpers with seed derivation and batching
New utils/sampling.py module for consistent dataset sampling plan resolution
Base optimizer enhancements to support centralized sampling and RNG derivation
Updates to all optimizer algorithms to use the new centralized utilities

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`sdks/opik_optimizer/src/opik_optimizer/utils/rng.py`	Adds deterministic RNG utilities including seed hashing, child RNG derivation, and batching with optional shuffling
`sdks/opik_optimizer/src/opik_optimizer/utils/sampling.py`	Implements sampling plan resolution with support for explicit IDs, sample counts, and dataset size clamping
`sdks/opik_optimizer/src/opik_optimizer/base_optimizer.py`	Integrates centralized RNG and sampling plan preparation methods into the base optimizer class
`sdks/opik_optimizer/src/opik_optimizer/task_evaluator.py`	Adds compatibility wrapper for upcoming `evaluate_on_dict_items` SDK feature
`sdks/opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/ops/evaluation_ops.py`	Migrates to centralized sampling plan and deterministic RNG for evaluation subset selection
`sdks/opik_optimizer/src/opik_optimizer/algorithms/evolutionary_optimizer/evolutionary_optimizer.py`	Removes global random seeding in favor of optimizer-scoped RNG derivation
`sdks/opik_optimizer/src/opik_optimizer/algorithms/few_shot_bayesian_optimizer/few_shot_bayesian_optimizer.py`	Replaces local `_make_rng` method with centralized `rng_utils.make_rng`
`sdks/opik_optimizer/src/opik_optimizer/algorithms/gepa_optimizer/gepa_optimizer.py`	Adopts sampling plan approach for train/validation splits with deterministic ID selection
`sdks/opik_optimizer/src/opik_optimizer/utils/dataset_utils.py`	Adds resilience for dataset size mismatches and early-exit optimization when skipping existing datasets
`sdks/opik_optimizer/tests/unit/test_*.py`	Comprehensive unit tests for new RNG and sampling utilities

sdks/opik_optimizer/src/opik_optimizer/utils/sampling.py

sdks/opik_optimizer/src/opik_optimizer/utils/dataset_utils.py

sdks/opik_optimizer/src/opik_optimizer/base_optimizer.py

sdks/opik_optimizer/src/opik_optimizer/utils/rng.py

...pik_optimizer/src/opik_optimizer/algorithms/evolutionary_optimizer/evolutionary_optimizer.py

alexkuzmik · 2025-12-01T10:48:20Z

sdks/opik_optimizer/src/opik_optimizer/task_evaluator.py


 logger = logging.getLogger(__name__)

+try:


We can bump required opik version

alexkuzmik · 2025-12-01T11:10:31Z

sdks/opik_optimizer/tests/unit/test_task_evaluator_sampling.py

+    return 1.0
+
+
+def test_evaluate_passes_ids_and_samples(monkeypatch: Any) -> None:


FYI, such a complex unit test setup to only check that the argument is transferred correctly to another function is a symptom of overloaded functions.
It's not a part of this PR specifically, but after looking at the test and at the function I'd say that this could be moved out to another python module and tested separately (we can also turn task_evaluator into the namespace for better grouping):

# part of _evaluate_internal function if dataset_item_ids: # FIXME: In rare cases sometimes dataset ids are missing (cause unknown, skip those for now) available_ids = {item.get("id") for item in items} missing_ids = [ item_id for item_id in dataset_item_ids if item_id not in available_ids ] if missing_ids: logger.warning( "Dropping %s dataset_item_ids not present in dataset %s (showing first 5): %s", len(missing_ids), getattr(dataset, "name", None) or "<unknown>", missing_ids[:5], ) dataset_item_ids = [ item_id for item_id in dataset_item_ids if item_id in available_ids ] if not dataset_item_ids: logger.warning( "All provided dataset_item_ids were missing; evaluating on full dataset instead." ) dataset_item_ids = None else: items = [item for item in items if item.get("id") in dataset_item_ids]

P.S. I don't know why exactly we need this logic here at all, opik.evaluate_optimization_trial has already working dataset_items_ids, and dataset_sampler parameters. But maybe I'm not aware of some peculiar use case. Can we improve the behavior in the core SDK if we're missing something here?

alexkuzmik · 2025-12-01T11:14:29Z

sdks/opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/ops/candidate_ops.py

-from ....api_objects import chat_prompt
 from .... import _llm_calls
+from ...._llm_calls import StructuredOutputParsingError
+from ....api_objects import chat_prompt


If it's more than 2 '..' it's better to just use absolute import from opik_optimizer.api_objects import chat_prompt

alexkuzmik · 2025-12-01T11:17:58Z

sdks/opik_optimizer/src/opik_optimizer/utils/rng.py

+    return random.Random(derived)
+
+
+def sample_ids(rng: random.Random, ids: Sequence[T], k: int) -> list[T]:


No need to make it sample_ids and not sample for anything :)

alexkuzmik · 2025-12-01T11:31:05Z

sdks/opik_optimizer/src/opik_optimizer/utils/sampling.py

+
+def resolve_sampling(
+    dataset: Any,
+    n_samples: int | str | None,


We should use literals in such cases.

alexkuzmik · 2025-12-01T11:42:57Z

sdks/opik_optimizer/src/opik_optimizer/base_optimizer.py

                    serialized_tools.append({k: v for k, v in tool.items() if k})
            return serialized_tools

+    # ------------------------------------------------------------------


BaseOptimizer is already overloaded and not easy to track. Maybe worth encapsulating the random sampling logic to a separate class with a compact API rather than continuing to grow the base class.

vincentkoc added 22 commits November 25, 2025 09:44

Update dataset_utils.py

dc4e6fe

refactor: sampling

2b778ab

feat: rng

8ef95b8

Update base_optimizer.py

f544f0d

Update task_evaluator.py

3abcfde

Update evaluation_ops.py

1b914d5

Update context_ops.py

991f0dd

Update candidate_ops.py

ddd20ed

Update meta_prompt_optimizer.py

c4b56c7

Update hierarchical_root_cause_analyzer.py

0e23638

Update hierarchical_reflective_optimizer.py

b8e56da

Update gepa_optimizer.py

b00bde1

Update few_shot_bayesian_optimizer.py

41d4e26

Update evaluation_ops.py

eabf865

Update evolutionary_optimizer.py

286047e

chore: lint

d5247e7

Update gepa_optimizer.py

95084b5

chore: tests

268d84b

Update test_sampling_utils.py

c0b3b62

chore: fix

22f95fd

Update hierarchical_root_cause_analyzer.py

2b56bad

Update dataset_utils.py

0ee290e

Copilot AI review requested due to automatic review settings November 25, 2025 21:05

vincentkoc requested review from a team and dsblank as code owners November 25, 2025 21:05

github-actions bot assigned vincentkoc Nov 25, 2025

Copilot AI reviewed Nov 25, 2025

View reviewed changes

vincentkoc added 3 commits November 25, 2025 13:51

Update test_task_evaluator_sampling.py

e315ab3

linter

b3aaeae

fix: lint

6bd62ee

vincentkoc added the DO NOT MERGE label Nov 26, 2025

alexkuzmik requested changes Dec 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NA][SDK] Optimizer centralized shuffle, seed and sampling logic. #4227

[NA][SDK] Optimizer centralized shuffle, seed and sampling logic. #4227

Uh oh!

vincentkoc commented Nov 25, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexkuzmik Dec 1, 2025

Uh oh!

alexkuzmik Dec 1, 2025

Uh oh!

alexkuzmik Dec 1, 2025

Uh oh!

alexkuzmik Dec 1, 2025

Uh oh!

alexkuzmik Dec 1, 2025

Uh oh!

alexkuzmik Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return 1.0


		def test_evaluate_passes_ids_and_samples(monkeypatch: Any) -> None:

		return random.Random(derived)


		def sample_ids(rng: random.Random, ids: Sequence[T], k: int) -> list[T]:

[NA][SDK] Optimizer centralized shuffle, seed and sampling logic. #4227

Are you sure you want to change the base?

[NA][SDK] Optimizer centralized shuffle, seed and sampling logic. #4227

Uh oh!

Conversation

vincentkoc commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Change checklist

Issues

Testing

Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexkuzmik Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

alexkuzmik Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

alexkuzmik Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

alexkuzmik Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

alexkuzmik Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

alexkuzmik Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vincentkoc commented Nov 25, 2025 •

edited

Loading