-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[NA][SDK] Optimizer centralized shuffle, seed and sampling logic. #4227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR centralizes dataset sampling, seeding, and shuffling logic across the opik_optimizer SDK to ensure consistent and reproducible evaluation runs. It introduces new utility modules (rng.py and sampling.py) that provide deterministic random number generation and unified sampling plan resolution, replacing ad-hoc implementations scattered across different optimizers.
Key changes include:
- New
utils/rng.pymodule providing deterministic RNG helpers with seed derivation and batching - New
utils/sampling.pymodule for consistent dataset sampling plan resolution - Base optimizer enhancements to support centralized sampling and RNG derivation
- Updates to all optimizer algorithms to use the new centralized utilities
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
sdks/opik_optimizer/src/opik_optimizer/utils/rng.py |
Adds deterministic RNG utilities including seed hashing, child RNG derivation, and batching with optional shuffling |
sdks/opik_optimizer/src/opik_optimizer/utils/sampling.py |
Implements sampling plan resolution with support for explicit IDs, sample counts, and dataset size clamping |
sdks/opik_optimizer/src/opik_optimizer/base_optimizer.py |
Integrates centralized RNG and sampling plan preparation methods into the base optimizer class |
sdks/opik_optimizer/src/opik_optimizer/task_evaluator.py |
Adds compatibility wrapper for upcoming evaluate_on_dict_items SDK feature |
sdks/opik_optimizer/src/opik_optimizer/algorithms/meta_prompt_optimizer/ops/evaluation_ops.py |
Migrates to centralized sampling plan and deterministic RNG for evaluation subset selection |
sdks/opik_optimizer/src/opik_optimizer/algorithms/evolutionary_optimizer/evolutionary_optimizer.py |
Removes global random seeding in favor of optimizer-scoped RNG derivation |
sdks/opik_optimizer/src/opik_optimizer/algorithms/few_shot_bayesian_optimizer/few_shot_bayesian_optimizer.py |
Replaces local _make_rng method with centralized rng_utils.make_rng |
sdks/opik_optimizer/src/opik_optimizer/algorithms/gepa_optimizer/gepa_optimizer.py |
Adopts sampling plan approach for train/validation splits with deterministic ID selection |
sdks/opik_optimizer/src/opik_optimizer/utils/dataset_utils.py |
Adds resilience for dataset size mismatches and early-exit optimization when skipping existing datasets |
sdks/opik_optimizer/tests/unit/test_*.py |
Comprehensive unit tests for new RNG and sampling utilities |
...pik_optimizer/src/opik_optimizer/algorithms/evolutionary_optimizer/evolutionary_optimizer.py
Outdated
Show resolved
Hide resolved
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can bump required opik version
| return 1.0 | ||
|
|
||
|
|
||
| def test_evaluate_passes_ids_and_samples(monkeypatch: Any) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, such a complex unit test setup to only check that the argument is transferred correctly to another function is a symptom of overloaded functions.
It's not a part of this PR specifically, but after looking at the test and at the function I'd say that this could be moved out to another python module and tested separately (we can also turn task_evaluator into the namespace for better grouping):
# part of _evaluate_internal function
if dataset_item_ids:
# FIXME: In rare cases sometimes dataset ids are missing (cause unknown, skip those for now)
available_ids = {item.get("id") for item in items}
missing_ids = [
item_id for item_id in dataset_item_ids if item_id not in available_ids
]
if missing_ids:
logger.warning(
"Dropping %s dataset_item_ids not present in dataset %s (showing first 5): %s",
len(missing_ids),
getattr(dataset, "name", None) or "<unknown>",
missing_ids[:5],
)
dataset_item_ids = [
item_id for item_id in dataset_item_ids if item_id in available_ids
]
if not dataset_item_ids:
logger.warning(
"All provided dataset_item_ids were missing; evaluating on full dataset instead."
)
dataset_item_ids = None
else:
items = [item for item in items if item.get("id") in dataset_item_ids]P.S. I don't know why exactly we need this logic here at all, opik.evaluate_optimization_trial has already working dataset_items_ids, and dataset_sampler parameters. But maybe I'm not aware of some peculiar use case. Can we improve the behavior in the core SDK if we're missing something here?
| from ....api_objects import chat_prompt | ||
| from .... import _llm_calls | ||
| from ...._llm_calls import StructuredOutputParsingError | ||
| from ....api_objects import chat_prompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's more than 2 '..' it's better to just use absolute import from opik_optimizer.api_objects import chat_prompt
| return random.Random(derived) | ||
|
|
||
|
|
||
| def sample_ids(rng: random.Random, ids: Sequence[T], k: int) -> list[T]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to make it sample_ids and not sample for anything :)
|
|
||
| def resolve_sampling( | ||
| dataset: Any, | ||
| n_samples: int | str | None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use literals in such cases.
| serialized_tools.append({k: v for k, v in tool.items() if k}) | ||
| return serialized_tools | ||
|
|
||
| # ------------------------------------------------------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BaseOptimizer is already overloaded and not easy to track. Maybe worth encapsulating the random sampling logic to a separate class with a compact API rather than continuing to grow the base class.
Details
This PR centralizes dataset sampling, seeding, and shuffling logic across the opik_optimizer SDK to ensure consistent and reproducible evaluation runs. It introduces new utility modules (
rng.pyandsampling.py) that provide deterministic random number generation and unified sampling plan resolution, replacing ad-hoc implementations scattered across different optimizers.Key changes include:
utils/rng.pymodule providing deterministic RNG helpers with seed derivation and batchingutils/sampling.pymodule for consistent dataset sampling plan resolutionChange checklist
Issues
Testing
Local
Documentation
tbd