-
Notifications
You must be signed in to change notification settings - Fork 9
Feature/aggregation final #554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds a feature-complete time‑series clustering system (cluster/expand), clustering data structures and IO, inter‑cluster storage modes, fxplot xarray plotting accessors, a multi‑system Comparison API, replaces hours_per_timestep with timestep_duration, updates many notebooks/docs and CI for notebook caching/execution, and adds extensive clustering tests. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User as Notebook / CLI
participant Transform as TransformAccessor.cluster()
participant ReducedFS as Reduced FlowSystem
participant Solver as Solver (optimize)
participant Expander as TransformAccessor.expand_solution()
participant IO as FlowSystem.to_dataset/from_dataset
Note over User,Transform: Request clustering (n_clusters, duration, peaks)
User->>Transform: cluster(flow_system, params)
Transform->>ReducedFS: build reduced FlowSystem + clustering metadata
ReducedFS-->>User: reduced FlowSystem
alt optimize clustered
User->>ReducedFS: optimize(solver)
ReducedFS->>Solver: build_model() & solve()
Solver-->>ReducedFS: clustered solution
end
User->>Expander: expand_solution(reduced_flow_system)
Expander->>ReducedFS: expand clustered solution → original timesteps (expand_data, SOC_boundary handling)
ReducedFS-->>User: expanded full-resolution solution
Note over ReducedFS,IO: clustering metadata roundtrip
ReducedFS->>IO: to_dataset(include clustering arrays/attrs)
IO-->>ReducedFS: from_dataset reconstructs Clustering object
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
Fix all issues with AI Agents 🤖
In @docs/design/cluster_architecture.md:
- Around line 36-39: The fenced code block that documents the current flat
`time` structure (the block containing the line "time: (n_clusters ×
timesteps_per_cluster,) # Flat, e.g., (864,) for 9 clusters × 96 timesteps") is
using a bare ``` fence; update that fence to include a language identifier (for
example, ```text) so markdownlint passes and the snippet renders with explicit
language highlighting/readability in the `cluster_architecture.md`
documentation.
- Around line 417-443: The docs introduce a property named aggregation_weight
that conflicts with the FlowSystem API's temporal_weight; update the example to
use the canonical name temporal_weight (or add a clear note that
aggregation_weight is an alias) so readers and code samples match the API.
Specifically, rename the example property aggregation_weight to temporal_weight
(or add an explicit comment stating aggregation_weight == temporal_weight) and
ensure references to is_clustered, cluster_weight, timestep_duration, and
FlowSystem all consistently use temporal_weight terminology.
In @docs/notebooks/08b-rolling-horizon.ipynb:
- Around line 183-187: The markdown cell is malformed because it contains the
raw tag "<cell_type>markdown</cell_type>"; remove that artifact so the cell's
source is valid markdown and ensure the header and following text read as: "##
Visualize: Heat Balance Comparison" followed by the sentence "Use the
`Comparison` class to view both methods side-by-side:" (i.e., convert the cell
into a proper markdown cell without the raw tag).
In @docs/notebooks/08c2-clustering-storage-modes.ipynb:
- Around line 384-397: The doc example uses cluster_storage_mode but earlier
documentation uses cluster_mode; make them consistent by renaming the attribute
in this snippet to cluster_mode (both when constructing Storage and when
mutating the component), e.g. Storage(..., cluster_mode='intercluster_cyclic')
and flow_system.components['SeasonalStorage'].cluster_mode = 'cyclic', keeping
the same default/value semantics as the rest of the docs.
- Around line 173-177: The inline comment for clustering is inconsistent with
the value: N_CLUSTERS is set to 24 but the comment reads "12 typical days";
decide whether you want 12 or 24 clusters and make them consistent by either
changing N_CLUSTERS to 12 or updating the comment to "24 typical days" (or a
generic description like "number of typical days") near the N_CLUSTERS
declaration so the comment matches the variable value.
In @docs/notebooks/data/raw/README.md:
- Line 10: Replace the bare URL occurrences in README.md by wrapping each plain
URL (e.g. https://re.jrc.ec.europa.eu/pvg_tools/en/) in proper markdown link
syntax—either angle-bracket form <https://re.jrc.ec.europa.eu/pvg_tools/en/> or
a descriptive link [PVGIS tools](https://re.jrc.ec.europa.eu/pvg_tools/en/)—and
update the three occurrences noted (the
https://re.jrc.ec.europa.eu/pvg_tools/en/ instance and the two other bare URLs
referenced) so all bare URLs are markdown-compliant.
In @docs/user-guide/optimization/clustering.md:
- Around line 49-59: Escape the pipe characters inside the code spans for the
values used in the table to prevent Markdown from splitting columns: update the
entries for time_series_for_high_peaks and time_series_for_low_peaks so the
literal strings containing '|' are escaped (e.g., change
'HeatDemand(Q)|fixed_relative_profile' and 'SolarGen(P)|fixed_relative_profile'
to include a backslash before the pipe within the backtick code span) while
leaving the parameter names (time_series_for_high_peaks,
time_series_for_low_peaks) and other table cells unchanged.
- Around line 158-160: The fenced code block containing the formula
"actual_SOC(t) = SOC_boundary[period] × (1 - loss)^t + ΔE(t)" lacks a language
specifier; update that triple-backtick fence to include a language label (e.g.,
```text or ```math or ```python) so markdownlint passes and rendering improves,
by editing the fenced block around the actual_SOC(t) formula in the
clustering.md content.
In @flixopt/components.py:
- Around line 1536-1541: The decay factor in the SOC boundary constraint uses
raw timestep index `offset` instead of hours; update the exponent to use hours
by multiplying `offset` with `timestep_duration` (i.e. compute `decay_t = (1 -
rel_loss) ** (offset * timestep_duration)`), then recompute `combined = soc_d *
decay_t + cs_t` and keep the existing `self.add_constraints(combined >= 0,
short_name=f'soc_lb_{sample_name}')` call so it matches the fix applied in
`_add_linking_constraints`.
- Around line 1476-1483: The decay exponent uses timesteps_per_cluster as if
each timestep is one hour; update the decay_n calculation to multiply
timesteps_per_cluster by timestep_duration so the exponent is total hours: use
self.element.relative_loss_per_hour (mean over time) and compute decay_n = (1 -
rel_loss) ** (timesteps_per_cluster * timestep_duration) before forming lhs =
soc_after - soc_before * decay_n - delta_soc_ordered and calling
self.add_constraints(..., short_name='link').
In @flixopt/dataset_plot_accessor.py:
- Around line 856-914: The DataArray PlotAccessor methods line and area are
missing the exclude_dims parameter which causes API mismatch with the Dataset
versions; add an exclude_dims parameter to both DataArrayPlotAccessor.line and
DataArrayPlotAccessor.area signatures (matching the Dataset API type/optional
default), and forward it through to the underlying call
(self._to_dataset().fxplot.line and .area) by passing exclude_dims=exclude_dims
alongside the existing forwarded args so the parameter is accepted and
propagated.
- Around line 827-854: The stacked_bar wrapper accepts an exclude_dims parameter
but never forwards it to the underlying call; update the call in stacked_bar to
pass exclude_dims through to self._to_dataset().fxplot.stacked_bar (i.e., add
exclude_dims=exclude_dims to the argument list) so the parameter is not ignored.
- Around line 799-825: The DataArray/plot accessor method bar is missing the
exclude_dims parameter; add an argument exclude_dims: Sequence[str] | None =
None to the bar signature and forward it into the delegated call (i.e., include
exclude_dims=exclude_dims in the self._to_dataset().fxplot.bar(...) call), and
ensure you import Sequence from typing if not already present so the type hint
compiles.
In
@tests/deprecated/examples/03_Optimization_modes/example_optimization_modes.py:
- Around line 36-41: The variable include_storage is declared but never used;
either remove the unused declaration or thread it into the clustering call:
locate the cluster(...) invocation in example_optimization_modes.py and add
include_storage=include_storage to the call (or, if the cluster API uses a
different param name, pass it accordingly), or simply delete the include_storage
= False line to avoid dead code; ensure you reference the include_storage symbol
and the cluster(...) call when making the change.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
tests/deprecated/test_on_hours_computation.py (1)
8-44: Update function signature to match parameter rename.The parameter rename to
timestep_durationis consistent throughout the test class, but the function signature inflixopt/modeling.py(line 122) still useshours_per_timestep. Updatecompute_consecutive_hours_in_stateto usetimestep_durationfor consistency across the codebase.flixopt/optimization.py (3)
134-135: Stale reference toClusteredOptimizationin docstring.The docstring mentions "consider using ClusteredOptimization (time aggregation)" but
ClusteredOptimizationhas been removed from this module. Update to reference the new API.🔎 Proposed fix
- For large problems, consider using ClusteredOptimization (time aggregation) - or SegmentedOptimization (temporal decomposition) instead. + For large problems, consider using FlowSystem.transform.cluster() (time aggregation) + or SegmentedOptimization (temporal decomposition) instead.
141-141: Docstring fornormalize_weightsis outdated.The parameter description implies user control over normalization, but the implementation now ignores the value and always normalizes. Update to reflect the deprecated status.
🔎 Proposed fix
- normalize_weights: Whether to automatically normalize the weights of scenarios to sum up to 1 when solving. + normalize_weights: Deprecated. Scenario weights are now always normalized when set on FlowSystem.
460-462: Stale reference toClusteredOptimizationin warning.The warning text mentions
ClusteredOptimizationwhich has been removed.🔎 Proposed fix
- The evaluation of the solution is a bit more complex than Optimization or ClusteredOptimization + The evaluation of the solution is a bit more complex than Optimizationtests/deprecated/examples/03_Optimization_modes/example_optimization_modes.py (1)
179-179: Outdated type hint references removed class.The type hint
list[fx.Optimization | fx.ClusteredOptimization | fx.SegmentedOptimization]referencesfx.ClusteredOptimizationwhich has been removed.🔎 Proposed fix
- optimizations: list[fx.Optimization | fx.ClusteredOptimization | fx.SegmentedOptimization] = [] + optimizations: list[fx.Optimization | fx.SegmentedOptimization] = []flixopt/transform_accessor.py (1)
1212-1460: Fix expand_solution for clustered storages: NameError and multi‑dimensional cluster mappingTwo issues in
expand_solution()can break expansion when storages participate in clustering, especially with periods/scenarios:
NameError for
original_timesteps_extra
expand_da()usesoriginal_timesteps_extrawhen handling|charge_statevariables, butoriginal_timesteps_extrais only defined later in the function. As soon as you expand a clustered storage with acluster-dim charge_state, this will raise aNameError.Incorrect cluster-index mapping for SOC decay in multi-dim cases
In the SOC decay section, you compute:cluster_per_timestep = cluster_structure.cluster_order.values[original_cluster_indices] xr.DataArray(cluster_per_timestep, dims=['time'])If
cluster_orderhas dims(original_cluster, period, scenario),cluster_order.values[...]is 3‑D, and wrapping it inxr.DataArray(..., dims=['time'])will fail (dims length≠ndim). Even if it didn’t, you’d be discarding the period/scenario structure. The subsequentisel(cluster=...)ondecay_daalso assumes a 1‑D indexer.You can address both by:
- Computing
original_timesteps_extra(and its length) before definingexpand_da, so it’s available when appending the extra timestep.- Using xarray-based indexing with a
DataArrayindexer over theoriginal_clusterdimension, preserving period/scenario dims when mapping clusters to timesteps.Proposed patch for
expand_solution()@@ - # Get original timesteps from clustering, but periods/scenarios from the FlowSystem - # (the clustered FlowSystem preserves the same periods/scenarios) - original_timesteps = info.original_timesteps - has_periods = self._fs.periods is not None - has_scenarios = self._fs.scenarios is not None - - periods = list(self._fs.periods) if has_periods else [None] - scenarios = list(self._fs.scenarios) if has_scenarios else [None] - n_original_timesteps = len(original_timesteps) - n_reduced_timesteps = n_clusters * timesteps_per_cluster - n_original_clusters = cluster_structure.n_original_clusters - - # Expand function using ClusterResult.expand_data() - handles multi-dimensional cases - # For charge_state with cluster dim, also includes the extra timestep - # Clamp to valid bounds to handle partial clusters at the end - last_original_cluster_idx = min( - (n_original_timesteps - 1) // timesteps_per_cluster, - n_original_clusters - 1, - ) + # Get original timesteps from clustering, but periods/scenarios from the FlowSystem + # (the clustered FlowSystem preserves the same periods/scenarios) + original_timesteps = info.original_timesteps + # Add extra timestep (same logic as for non-expanded FlowSystems) + original_timesteps_extra = FlowSystem._create_timesteps_with_extra(original_timesteps, None) + + has_periods = self._fs.periods is not None + has_scenarios = self._fs.scenarios is not None + + periods = list(self._fs.periods) if has_periods else [None] + scenarios = list(self._fs.scenarios) if has_scenarios else [None] + n_original_timesteps = len(original_timesteps) + n_reduced_timesteps = n_clusters * timesteps_per_cluster + n_original_clusters = cluster_structure.n_original_clusters + n_original_timesteps_extra = len(original_timesteps_extra) + + # Expand function using ClusterResult.expand_data() - handles multi-dimensional cases + # For charge_state with cluster dim, also includes the extra timestep + # Clamp to valid bounds to handle partial clusters at the end + last_original_cluster_idx = min( + (n_original_timesteps - 1) // timesteps_per_cluster, + n_original_clusters - 1, + ) @@ - # 1. Expand FlowSystem data (with cluster_weight set to 1.0 for all timesteps) + # 1. Expand FlowSystem data (with cluster_weight set to 1.0 for all timesteps) reduced_ds = self._fs.to_dataset(include_solution=False) @@ - expanded_ds = xr.Dataset(data_vars, attrs=attrs) - # Compute timestep_duration from original timesteps - # Add extra timestep for duration calculation (assume same interval as last) - original_timesteps_extra = FlowSystem._create_timesteps_with_extra(original_timesteps, None) - timestep_duration = FlowSystem.calculate_timestep_duration(original_timesteps_extra) + expanded_ds = xr.Dataset(data_vars, attrs=attrs) + # Compute timestep_duration from original timesteps (with extra) + timestep_duration = FlowSystem.calculate_timestep_duration(original_timesteps_extra) expanded_ds.attrs['timestep_duration'] = timestep_duration.values.tolist() @@ - # Create cluster_weight with value 1.0 for all timesteps (no weighting needed for expanded) - # Use _combine_slices_to_dataarray for consistent multi-dim handling - ones_da = xr.DataArray(np.ones(n_original_timesteps), dims=['time'], coords={'time': original_timesteps}) + # Create cluster_weight with value 1.0 for all timesteps (no weighting needed for expanded) + # Use _combine_slices_to_dataarray for consistent multi-dim handling + ones_da = xr.DataArray(np.ones(n_original_timesteps), dims=['time'], coords={'time': original_timesteps}) @@ - n_original_timesteps_extra = len(original_timesteps_extra) soc_boundary_vars = [name for name in reduced_solution.data_vars if name.endswith('|SOC_boundary')] @@ - time_within_period = np.arange(n_original_timesteps_extra) % timesteps_per_cluster + time_within_period = np.arange(n_original_timesteps_extra) % timesteps_per_cluster @@ - loss_value = storage.relative_loss_per_hour.mean('time') - if (loss_value > 0).any(): - decay_da = (1 - loss_value) ** time_within_period_da - if 'cluster' in decay_da.dims: - # Map each timestep to its cluster's decay value - cluster_per_timestep = cluster_structure.cluster_order.values[original_cluster_indices] - decay_da = decay_da.isel(cluster=xr.DataArray(cluster_per_timestep, dims=['time'])).drop_vars( - 'cluster', errors='ignore' - ) - soc_boundary_per_timestep = soc_boundary_per_timestep * decay_da + loss_value = storage.relative_loss_per_hour.mean('time') + if (loss_value > 0).any(): + decay_da = (1 - loss_value) ** time_within_period_da + if 'cluster' in decay_da.dims: + # Map each timestep to its cluster's decay value, preserving period/scenario structure. + cluster_order = cluster_structure.cluster_order + cluster_indices_da = xr.DataArray( + original_cluster_indices, + dims=['time'], + coords={'time': original_timesteps_extra}, + ) + # Shape after isel: (time, period?, scenario?) + cluster_per_timestep = cluster_order.isel(original_cluster=cluster_indices_da) + decay_da = decay_da.isel(cluster=cluster_per_timestep).drop_vars('cluster', errors='ignore') + soc_boundary_per_timestep = soc_boundary_per_timestep * decay_daThese fixes should make
expand_solution()work reliably for clustered storages across both single- and multi‑period/scenario FlowSystems.
🧹 Nitpick comments (22)
.pre-commit-config.yaml (1)
10-10: LGTM! The broadened exclusion pattern works correctly.The regex syntax is correct and aligns with the new documentation data assets introduced in this PR. The pattern now excludes both the original
Zeitreihen2020.csvfile and any files underdocs/notebooks/data/raw/.Optionally, if specific file types or patterns are known (e.g., only CSV files or files matching certain naming conventions), you could make the exclusion more targeted to avoid inadvertently allowing unintended large files to bypass the check.
.github/workflows/docs.yaml (2)
60-65: Cache key computation is sound but consider edge cases.The cache key logic correctly produces a stable hash by sorting file paths. However, the
catcommand may encounter issues if any.ipynbor.pyfiles contain binary data or have special characters in filenames that could affect shell parsing.🔎 Optional: Add safeguards for edge cases
Consider using
find -print0andxargs -0consistently (already done for execution step), and adding error handling:- name: Get notebook cache key id: notebook-cache-key run: | # Hash notebooks + flixopt source code (sorted for stable cache keys) - HASH=$(find docs/notebooks -name '*.ipynb' | sort | xargs cat | cat - <(find flixopt -name '*.py' | sort | xargs cat) | sha256sum | cut -d' ' -f1) + HASH=$(find docs/notebooks -name '*.ipynb' -print0 | sort -z | xargs -0 cat | cat - <(find flixopt -name '*.py' -print0 | sort -z | xargs -0 cat) | sha256sum | cut -d' ' -f1) echo "hash=$HASH" >> $GITHUB_OUTPUTThis ensures filenames with spaces or special characters are handled correctly.
Also applies to: 122-127
74-80: Consider adding error handling for parallel notebook execution.The parallel execution logic is correct, but failures in individual notebooks may not halt the workflow due to
xargsdefault behavior.🔎 Recommended: Add explicit error handling
Consider adding the
-n 1flag to ensure each notebook is processed separately, and check exit codes:- name: Execute notebooks in parallel if: steps.notebook-cache.outputs.cache-hit != 'true' run: | # Execute all notebooks in parallel (4 at a time) # Run from notebooks directory so relative imports work cd docs/notebooks && find . -name '*.ipynb' -print0 | \ - xargs -0 -P 4 -I {} jupyter execute --inplace {} + xargs -0 -P 4 -n 1 -I {} sh -c 'jupyter execute --inplace {} || exit 255'The
-n 1ensures one notebook per invocation, and|| exit 255ensures xargs sees failures (xargs only fails if a command exits with 255).Also applies to: 136-140
flixopt/clustering/intercluster_helpers.py (1)
64-131: Consider validation for capacity parameter edge cases.The function correctly handles the three capacity types (fixed, InvestParameters, unbounded). However, it doesn't validate edge cases like negative capacity or zero fixed_size.
🔎 Optional: Add input validation
Consider adding validation at the start of the function:
def extract_capacity_bounds( capacity_param: InvestParameters | int | float | None, boundary_coords: dict, boundary_dims: list[str], ) -> CapacityBounds: """Extract capacity bounds from storage parameters for SOC_boundary variable. ... """ + # Validate capacity parameter + if isinstance(capacity_param, (int, float)) and capacity_param < 0: + raise ValueError(f"Capacity must be non-negative, got {capacity_param}") + if hasattr(capacity_param, 'maximum_size') and capacity_param.maximum_size is not None: + if capacity_param.maximum_size < 0: + raise ValueError(f"maximum_size must be non-negative, got {capacity_param.maximum_size}") + if hasattr(capacity_param, 'fixed_size') and capacity_param.fixed_size is not None: + if capacity_param.fixed_size < 0: + raise ValueError(f"fixed_size must be non-negative, got {capacity_param.fixed_size}") + n_boundaries = len(boundary_coords['cluster_boundary']) lb_shape = [n_boundaries] + [len(boundary_coords[d]) for d in boundary_dims[1:]]This ensures invalid capacity specifications are caught early with clear error messages.
flixopt/dataset_plot_accessor.py (2)
191-458: Consider extracting shared logic to reduce duplication.The
bar,stacked_bar,line, andareamethods share ~90% identical code (slot assignment, DataFrame conversion, color processing, and kwargs building). A private helper could consolidate this:🔎 Suggested refactor pattern
def _prepare_plot_data( self, x, color, facet_col, facet_row, animation_frame, exclude_dims, colors, xlabel, ylabel, facet_cols, px_kwargs ) -> tuple[pd.DataFrame, dict, dict]: """Shared preparation for bar/line/area plots.""" slots = assign_slots(self._ds, x=x, color=color, facet_col=facet_col, facet_row=facet_row, animation_frame=animation_frame, exclude_dims=exclude_dims) df = _dataset_to_long_df(self._ds) if df.empty: return df, {}, slots color_labels = df[slots['color']].unique().tolist() if slots['color'] and slots['color'] in df.columns else [] color_map = process_colors(colors, color_labels, CONFIG.Plotting.default_qualitative_colorscale) labels = {**(({slots['x']: xlabel}) if xlabel and slots['x'] else {}), **({'value': ylabel} if ylabel else {})} fig_kwargs = { 'data_frame': df, 'x': slots['x'], 'y': 'value', 'color_discrete_map': color_map, **({'labels': labels} if labels else {}), **_build_fig_kwargs(slots, dict(self._ds.sizes), px_kwargs, facet_cols), } return df, fig_kwargs, slots
916-997: Duplicated heatmap logic between DataArray and Dataset accessors.The
DataArrayPlotAccessor.heatmapmethod duplicates most of the logic fromDatasetPlotAccessor.heatmap(slot resolution, imshow_args building, dimension squeezing). Consider delegating to the Dataset implementation:🔎 Suggested simplification
def heatmap( self, *, colors: str | list[str] | None = None, title: str = '', facet_col: str | Literal['auto'] | None = 'auto', animation_frame: str | Literal['auto'] | None = 'auto', facet_cols: int | None = None, **imshow_kwargs: Any, ) -> go.Figure: - da = self._da - # ... 80 lines of duplicated logic ... + name = self._da.name or 'value' + return self._da.to_dataset(name=name).fxplot.heatmap( + variable=name, + colors=colors, + title=title or name, + facet_col=facet_col, + animation_frame=animation_frame, + facet_cols=facet_cols, + **imshow_kwargs, + )docs/notebooks/data/generate_realistic_profiles.py (1)
163-170: Silent forward-fill on misaligned timesteps may mask data issues.The
reindex(timesteps, method='ffill')silently forward-fills missing values when requested timesteps don't align with the generated profile. This could mask timezone or frequency mismatches. Consider logging a warning when significant filling occurs.tests/test_clustering/test_base.py (3)
112-121: Consider adding test coverage forexpand_data()method.The
get_expansion_mapping()test is good, butexpand_data()is a core method ofClusterResult(per the relevant code snippets). Consider adding a test that verifies data expansion behavior, especially for the multi-dimensional case.Would you like me to draft a test case for the
expand_data()method?
127-139: Consider asserting cluster_order and cluster_occurrences values.The test validates
timesteps_per_clusterandn_original_clusters, but doesn't verify the computedcluster_orderorcluster_occurrences. Given the mapping pattern (periods 0 and 2 → cluster 0, period 1 → cluster 1), you could add assertions like:assert structure.n_clusters == 2 np.testing.assert_array_equal(structure.cluster_order.values, [0, 1, 0])
142-158: Minimal test coverage forClusteringclass.This test only verifies
backend_name. TheClusteringclass exposes many convenience properties (cluster_order,occurrences,n_clusters,timesteps_per_period,cluster_start_positions,original_timesteps) per the relevant code snippets. Consider adding tests for these accessors, or confirm they're covered in integration tests.docs/notebooks/data/generate_example_systems.py (2)
20-43: Import handling for multiple execution contexts.The try/except pattern with sys.path modification handles imports correctly across different contexts (package import, direct run, mkdocs-jupyter). However, modifying
sys.pathat module level is a side effect that persists.Consider using
importlibfor more explicit import handling, though the current approach is functional for the notebook context:Alternative using importlib (optional)
import importlib.util from pathlib import Path def _import_generate_realistic_profiles(): """Import generate_realistic_profiles handling different contexts.""" try: from . import generate_realistic_profiles return generate_realistic_profiles except ImportError: try: _data_dir = Path(__file__).parent except NameError: _data_dir = Path('docs/notebooks/data') spec = importlib.util.spec_from_file_location( "generate_realistic_profiles", _data_dir / "generate_realistic_profiles.py" ) module = importlib.util.module_from_spec(spec) spec.loader.exec_module(module) return module _profiles = _import_generate_realistic_profiles() ElectricityLoadGenerator = _profiles.ElectricityLoadGenerator # ... etc
56-59: Module-level data loading may cause import-time failures.Loading data at module level means import will fail if data files are missing, even if the user only wants to import a subset of functions.
For robustness, consider lazy loading with caching:
Optional lazy loading pattern
from functools import lru_cache @lru_cache(maxsize=1) def _get_weather(): return load_weather() @lru_cache(maxsize=1) def _get_elec_prices(): prices = load_electricity_prices() prices.index = prices.index.tz_localize(None) return prices # Then use _get_weather() and _get_elec_prices() in generatorsCHANGELOG.md (1)
65-83: Markdownlint MD046: fenced vs indented code blocksmarkdownlint is flagging the new fenced blocks in the v6.0.0 section (e.g. FlowSystem Comparison example, clustering example, fxplot example, migration snippet) as violating the configured “indented” code block style (MD046). If you want a clean lint run, consider switching these blocks to indented style or updating the lint rule to allow fenced blocks consistently with the rest of the changelog.
Also applies to: 92-103, 130-147, 156-170
docs/notebooks/03-investment-optimization.ipynb (1)
230-241: Guard ratio computation against zero solar size
'Ratio [kWh/kW]': tank_size / solar_sizewill produce inf/NaN if the optimal solar size is zero (which is a valid optimal outcome given the model). Consider guarding this to keep the tutorial output clean, e.g. returningNoneor a descriptive string whensolar_size == 0.docs/notebooks/05-multi-carrier-system.ipynb (1)
324-349: Comparison integration and result tables look good; consider guarding zero-baseline savingsThe multi-carrier setup and subsequent
fx.Comparison([fs_no_chp, flow_system])usage are consistent with the newComparisonAPI and should work well for side‑by‑side plots.In the summary tables, expressions like
(no_chp_costs - total_costs) / no_chp_costs * 100will produce inf/NaN ifno_chp_costs(orno_chp_co2) is zero. It’s unlikely in this example, but a simple conditional guard would keep the tutorial robust if someone tweaks input data.Also applies to: 368-426, 434-458
flixopt/comparison.py (1)
70-124: Dimension validation only checks dim names, not coordinate compatibility
_validate_matching_dimensions()ensures all systems share the same core dim names (time,period,scenario), but it doesn’t check that the coordinates (e.g. specific periods or scenarios) actually match. Withjoin='outer'insolution, comparing systems that have different period/scenario labels will still work but may silently introduce NaNs and misaligned semantics.If you want to make misuse more obvious, consider (optionally) asserting equality of coordinates for shared core dims (or at least logging a warning when labels differ), while still allowing outer joins for time.
Also applies to: 131-166
flixopt/components.py (1)
1026-1033: Unnecessary* 1multiplication.The
* 1on both sides of the constraint serves no purpose. This appears to be a remnant from a previous implementation or an attempt to convert types, but linopy variables don't require this.🔎 Proposed simplification
def _add_balanced_sizes_constraint(self): """Add constraint ensuring charging and discharging capacities are equal.""" if self.element.balanced: self.add_constraints( - self.element.charging.submodel._investment.size * 1 - == self.element.discharging.submodel._investment.size * 1, + self.element.charging.submodel._investment.size + == self.element.discharging.submodel._investment.size, short_name='balanced_sizes', )flixopt/clustering/base.py (1)
644-661: Potential edge case: empty data_vars could leavevaluesundefined.In the duration curve loop,
n = len(values)relies onvaluesfrom the last loop iteration. Ifds.data_varsis empty after filtering, this would raiseNameError.While upstream checks (
_resolve_variablesat line 630) should prevent this, adding a guard improves robustness.🔎 Proposed fix
# For duration curve: flatten and sort values if kind == 'duration_curve': sorted_vars = {} + n = 0 # Default in case of empty iteration for var in ds.data_vars: for rep in ds.coords['representation'].values: values = np.sort(ds[var].sel(representation=rep).values.flatten())[::-1] sorted_vars[(var, rep)] = values - n = len(values) + n = len(values) + if n == 0: + raise ValueError('No data available for duration curve comparison') ds = xr.Dataset(flixopt/flow_system.py (4)
47-50: Module-level side effect during import.Calling
_register_clustering_classes()at module load time could cause issues if the module is imported before all dependencies are ready, or if there are circular import edge cases. Consider using lazy registration (e.g., on first access to clustering features) or documenting that this is intentional.
679-692: Potential redundancy in clustering serialization.Both the old structure (lines 679-684, using attrs) and the new structure (lines 686-692, using Clustering object) are serialized when
self.clusteringis present. This may cause:
- Redundant data storage
- Confusion during deserialization about which source is authoritative
Consider removing the old structure if the new
Clusteringobject is the canonical source, or clearly document the backward compatibility intent.Also, line 684 stores
timestep_durationas a mean float, losing per-timestep granularity which may be needed for reconstruction.
1989-1998: Redundant fallback logic.Line 1997 has redundant logic:
cluster_weight = self.weights.get('cluster', self.cluster_weight if self.cluster_weight is not None else 1.0)The
weightsproperty (line 2111) already returnscluster_weight if cluster_weight is not None else self._unit_weight('cluster'). However,weights.get('cluster')will only return a value if clusters is not None (line 2110-2111).For non-clustered systems, this falls back correctly to 1.0, but for clustered systems, the fallback
self.cluster_weight if self.cluster_weight is not None else 1.0is never reached.Consider simplifying:
🔎 Proposed simplification
@property def temporal_weight(self) -> xr.DataArray: """Combined temporal weight (timestep_duration × cluster_weight). Use for converting rates to totals before summing. Note: cluster_weight is used even without a clusters dimension. """ - # Use cluster_weight directly if set, otherwise check weights dict, fallback to 1.0 - cluster_weight = self.weights.get('cluster', self.cluster_weight if self.cluster_weight is not None else 1.0) + cluster_weight = self.weights.get('cluster', 1.0) return self.weights['time'] * cluster_weight
2000-2013: Deprecated property lacks runtime warning.The
coordsproperty is marked as deprecated in the docstring but does not emit aDeprecationWarning. Unlike methods likesel(),isel(), andresample()which emit warnings, this property silently redirects toindexes.Consider adding a runtime deprecation warning for consistency:
🔎 Proposed fix
@property def coords(self) -> dict[FlowSystemDimensions, pd.Index]: """Active coordinates for variable creation. .. deprecated:: Use :attr:`indexes` instead. Returns a dict of dimension names to coordinate arrays. When clustered, includes 'cluster' dimension before 'time'. Returns: Dict mapping dimension names to coordinate arrays. """ + warnings.warn( + f'\ncoords is deprecated and will be removed in {DEPRECATION_REMOVAL_VERSION}. ' + 'Use flow_system.indexes instead.', + DeprecationWarning, + stacklevel=2, + ) return self.indexes
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
docs/notebooks/data/raw/electricity_prices_de_2020.csvis excluded by!**/*.csvdocs/notebooks/data/raw/tmy_dresden.csvis excluded by!**/*.csv
📒 Files selected for processing (77)
.github/workflows/docs.yaml.pre-commit-config.yamlCHANGELOG.mddocs/design/cluster_architecture.mddocs/notebooks/02-heat-system.ipynbdocs/notebooks/03-investment-optimization.ipynbdocs/notebooks/04-operational-constraints.ipynbdocs/notebooks/05-multi-carrier-system.ipynbdocs/notebooks/06a-time-varying-parameters.ipynbdocs/notebooks/07-scenarios-and-periods.ipynbdocs/notebooks/08a-aggregation.ipynbdocs/notebooks/08b-rolling-horizon.ipynbdocs/notebooks/08c-clustering.ipynbdocs/notebooks/08c2-clustering-storage-modes.ipynbdocs/notebooks/08d-clustering-multiperiod.ipynbdocs/notebooks/08e-clustering-internals.ipynbdocs/notebooks/09-plotting-and-data-access.ipynbdocs/notebooks/data/__init__.pydocs/notebooks/data/generate_example_systems.pydocs/notebooks/data/generate_realistic_profiles.pydocs/notebooks/data/raw/README.mddocs/notebooks/data/tutorial_data.pydocs/notebooks/fxplot_accessor_demo.ipynbdocs/user-guide/optimization/clustering.mddocs/user-guide/optimization/index.mddocs/user-guide/recipes/plotting-custom-data.mddocs/user-guide/results-plotting.mddocs/user-guide/results/index.mdflixopt/__init__.pyflixopt/clustering.pyflixopt/clustering/__init__.pyflixopt/clustering/base.pyflixopt/clustering/intercluster_helpers.pyflixopt/comparison.pyflixopt/components.pyflixopt/config.pyflixopt/core.pyflixopt/dataset_plot_accessor.pyflixopt/effects.pyflixopt/elements.pyflixopt/features.pyflixopt/flow_system.pyflixopt/io.pyflixopt/optimization.pyflixopt/optimize_accessor.pyflixopt/plot_result.pyflixopt/results.pyflixopt/statistics_accessor.pyflixopt/structure.pyflixopt/transform_accessor.pymkdocs.ymlpyproject.tomltests/deprecated/examples/03_Optimization_modes/example_optimization_modes.pytests/deprecated/test_bus.pytests/deprecated/test_effect.pytests/deprecated/test_flow.pytests/deprecated/test_flow_system_resample.pytests/deprecated/test_integration.pytests/deprecated/test_linear_converter.pytests/deprecated/test_on_hours_computation.pytests/deprecated/test_scenarios.pytests/deprecated/test_storage.pytests/test_bus.pytests/test_cluster_reduce_expand.pytests/test_clustering/__init__.pytests/test_clustering/test_base.pytests/test_clustering/test_integration.pytests/test_clustering_io.pytests/test_effect.pytests/test_flow.pytests/test_flow_system_resample.pytests/test_io_conversion.pytests/test_linear_converter.pytests/test_on_hours_computation.pytests/test_scenarios.pytests/test_sel_isel_single_selection.pytests/test_storage.py
💤 Files with no reviewable changes (2)
- flixopt/clustering.py
- flixopt/effects.py
🧰 Additional context used
🧬 Code graph analysis (23)
flixopt/core.py (1)
flixopt/flow_system.py (1)
coords(2001-2013)
tests/deprecated/test_storage.py (1)
flixopt/structure.py (2)
timestep_duration(220-222)timestep_duration(1749-1750)
flixopt/io.py (2)
flixopt/flow_system.py (1)
to_netcdf(828-853)flixopt/structure.py (1)
to_netcdf(937-964)
tests/test_bus.py (1)
flixopt/structure.py (2)
timestep_duration(220-222)timestep_duration(1749-1750)
tests/test_sel_isel_single_selection.py (1)
flixopt/flow_system.py (2)
FlowSystem(55-2493)transform(1605-1623)
tests/test_effect.py (1)
flixopt/structure.py (2)
timestep_duration(220-222)timestep_duration(1749-1750)
flixopt/optimize_accessor.py (1)
flixopt/flow_system.py (2)
FlowSystem(55-2493)build_model(1355-1391)
flixopt/elements.py (3)
flixopt/flow_system.py (3)
sum_temporal(2120-2137)temporal_weight(1990-1998)temporal_dims(1980-1987)flixopt/structure.py (5)
sum_temporal(259-265)timestep_duration(220-222)timestep_duration(1749-1750)temporal_weight(255-257)temporal_dims(247-252)flixopt/effects.py (1)
add_share_to_effects(647-667)
tests/deprecated/test_flow_system_resample.py (1)
flixopt/structure.py (2)
timestep_duration(220-222)timestep_duration(1749-1750)
flixopt/clustering/__init__.py (1)
flixopt/clustering/base.py (4)
Clustering(962-1117)ClusterResult(258-558)ClusterStructure(34-254)create_cluster_structure_from_mapping(1120-1167)
tests/test_on_hours_computation.py (1)
flixopt/modeling.py (2)
ModelingUtilities(118-193)compute_consecutive_hours_in_state(120-142)
tests/deprecated/test_on_hours_computation.py (1)
flixopt/modeling.py (2)
ModelingUtilities(118-193)compute_consecutive_hours_in_state(120-142)
tests/deprecated/test_effect.py (1)
flixopt/structure.py (2)
timestep_duration(220-222)timestep_duration(1749-1750)
tests/test_clustering/test_base.py (1)
flixopt/clustering/base.py (12)
Clustering(962-1117)ClusterResult(258-558)ClusterStructure(34-254)create_cluster_structure_from_mapping(1120-1167)cluster_order(1034-1038)n_clusters(1048-1053)timesteps_per_cluster(1071-1075)get_cluster_weight_per_timestep(172-198)timestep_mapping(1078-1080)n_original_timesteps(360-362)validate(498-558)get_expansion_mapping(364-373)
docs/notebooks/data/generate_example_systems.py (1)
docs/notebooks/data/generate_realistic_profiles.py (8)
ElectricityLoadGenerator(128-170)GasPriceGenerator(234-262)ThermalLoadGenerator(68-125)load_electricity_prices(53-62)load_weather(37-50)generate(87-125)generate(143-170)generate(237-262)
tests/test_linear_converter.py (3)
flixopt/results.py (4)
variables(335-339)variables(1260-1268)constraints(342-346)constraints(1271-1279)flixopt/structure.py (4)
variables(1717-1723)timestep_duration(220-222)timestep_duration(1749-1750)constraints(1708-1714)tests/conftest.py (1)
assert_conequal(755-771)
tests/deprecated/test_integration.py (4)
tests/deprecated/conftest.py (1)
assert_almost_equal_numeric(695-724)tests/conftest.py (1)
assert_almost_equal_numeric(696-725)flixopt/flow_system.py (2)
solution(1449-1466)solution(1469-1472)flixopt/structure.py (2)
solution(169-217)solution(1152-1166)
flixopt/optimization.py (1)
flixopt/flow_system.py (2)
FlowSystem(55-2493)create_model(1333-1353)
flixopt/transform_accessor.py (4)
flixopt/clustering/base.py (5)
n_clusters(1048-1053)Clustering(962-1117)ClusterResult(258-558)timestep_mapping(1078-1080)original_timesteps(1104-1117)flixopt/flow_system.py (9)
weights(2098-2118)FlowSystem(55-2493)to_dataset(630-697)sel(2282-2308)from_dataset(700-826)solution(1449-1466)solution(1469-1472)_create_timesteps_with_extra(319-327)calculate_timestep_duration(330-335)flixopt/structure.py (12)
weights(239-244)flow_system(437-455)timestep_duration(220-222)timestep_duration(1749-1750)to_dataset(912-935)add(1282-1291)add(1817-1819)get(1656-1661)get(1821-1823)from_dataset(967-1000)solution(169-217)solution(1152-1166)flixopt/core.py (3)
TimeSeriesData(34-124)drop_constant_arrays(566-596)is_timeseries_data(112-114)
flixopt/components.py (2)
flixopt/interface.py (3)
InvestParameters(956-1279)minimum_or_fixed_size(1242-1243)maximum_or_fixed_size(1246-1247)flixopt/clustering/intercluster_helpers.py (2)
build_boundary_coords(134-178)extract_capacity_bounds(64-131)
tests/test_clustering/test_integration.py (3)
flixopt/transform_accessor.py (1)
cluster(578-1067)flixopt/core.py (1)
TimeSeriesData(34-124)flixopt/clustering/base.py (4)
n_clusters(1048-1053)clusters(861-958)cluster_order(1034-1038)create_cluster_structure_from_mapping(1120-1167)
flixopt/__init__.py (1)
flixopt/comparison.py (1)
Comparison(23-165)
flixopt/flow_system.py (2)
flixopt/clustering/base.py (6)
Clustering(962-1117)_register_clustering_classes(1170-1179)clusters(861-958)_create_reference_structure(101-124)_create_reference_structure(328-357)_create_reference_structure(990-1003)flixopt/structure.py (8)
timestep_duration(220-222)timestep_duration(1749-1750)dims(229-231)_create_reference_structure(492-541)_create_reference_structure(1168-1183)get(1656-1661)get(1821-1823)flow_system(437-455)
🪛 LanguageTool
docs/design/cluster_architecture.md
[grammar] ~512-~512: Use a hyphen to join words.
Context: ...Goal: Produce true (cluster, time) shaped data. Tasks: 1. Update `clus...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
docs/user-guide/optimization/clustering.md
53-53: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
54-54: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
158-158: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/notebooks/data/raw/README.md
10-10: Bare URL used
(MD034, no-bare-urls)
26-26: Bare URL used
(MD034, no-bare-urls)
28-28: Bare URL used
(MD034, no-bare-urls)
docs/design/cluster_architecture.md
37-37: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
CHANGELOG.md
65-65: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
92-92: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
130-130: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
156-156: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.12)
- GitHub Check: test (3.14)
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
- GitHub Check: Build documentation
docs/design/cluster_architecture.md
Outdated
| **Current Implementation:** | ||
| ``` | ||
| time: (n_clusters × timesteps_per_cluster,) # Flat, e.g., (864,) for 9 clusters × 96 timesteps | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specify a language for the early fenced block
The code block showing the current flat time structure uses bare ; add a language (e.g. text) to satisfy markdownlint and improve readability in rendered docs.
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
37-37: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In @docs/design/cluster_architecture.md around lines 36-39, The fenced code
block that documents the current flat `time` structure (the block containing the
line "time: (n_clusters × timesteps_per_cluster,) # Flat, e.g., (864,) for 9
clusters × 96 timesteps") is using a bare ``` fence; update that fence to
include a language identifier (for example, ```text) so markdownlint passes and
the snippet renders with explicit language highlighting/readability in the
`cluster_architecture.md` documentation.
docs/design/cluster_architecture.md
Outdated
| # FlowSystem property updates: | ||
| @property | ||
| def is_clustered(self) -> bool: | ||
| return self.clustering is not None | ||
|
|
||
| @property | ||
| def cluster_weight(self) -> xr.DataArray: | ||
| if not self.is_clustered: | ||
| return xr.DataArray(1.0) | ||
| # Shape: (cluster,) - one weight per cluster | ||
| return xr.DataArray( | ||
| self.clustering.cluster_occurrences, | ||
| dims=['cluster'], | ||
| coords={'cluster': range(self.clustering.n_clusters)} | ||
| ) | ||
|
|
||
| @property | ||
| def timestep_duration(self) -> xr.DataArray: | ||
| if not self.is_clustered: | ||
| return self._timestep_duration # Shape: (time,) or scalar | ||
| # Shape: (cluster, time) when clustered | ||
| return self._timestep_duration # Already 2D from clustering | ||
|
|
||
| @property | ||
| def aggregation_weight(self) -> xr.DataArray: | ||
| return self.timestep_duration * self.cluster_weight # Broadcasting handles shapes | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align terminology between aggregation_weight and temporal_weight
The example property here introduces aggregation_weight, while the surrounding implementation snippets and FlowSystem API use temporal_weight for the same concept. Consider either renaming the example to temporal_weight or explicitly stating that aggregation_weight is just a conceptual name for temporal_weight to avoid confusion.
🤖 Prompt for AI Agents
In @docs/design/cluster_architecture.md around lines 417-443, The docs introduce
a property named aggregation_weight that conflicts with the FlowSystem API's
temporal_weight; update the example to use the canonical name temporal_weight
(or add a clear note that aggregation_weight is an alias) so readers and code
samples match the API. Specifically, rename the example property
aggregation_weight to temporal_weight (or add an explicit comment stating
aggregation_weight == temporal_weight) and ensure references to is_clustered,
cluster_weight, timestep_duration, and FlowSystem all consistently use
temporal_weight terminology.
tests/deprecated/examples/03_Optimization_modes/example_optimization_modes.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Fix all issues with AI Agents 🤖
In @docs/notebooks/08b-rolling-horizon.ipynb:
- Around line 66-73: The timestep-to-days calculation and resolution text are
wrong: update the print that uses flow_system.timesteps to divide len(timesteps)
by 96 (24*4 for 15-min intervals) instead of 24, and change the string "hourly
resolution" to "15-min resolution" so the line printing days and resolution
reflects the 15-minute data.
In
@tests/deprecated/examples/03_Optimization_modes/example_optimization_modes.py:
- Around line 210-216: The ClusteredResult class sets durations to the hardcoded
{'total': 0}, which misleads the speed comparison; wrap the clustering and
subsequent optimize call in a timer (e.g., using time.time()), compute the
elapsed time, modify ClusteredResult.__init__ to accept a duration parameter and
set self.durations = {'total': duration}, and instantiate optimization =
ClusteredResult('Clustered', clustered_fs, clustering_duration); alternatively,
if you don't want to measure time here, replace the hardcoded durations with a
clear comment that the value is a placeholder so the plot isn't misleading.
- Line 178: The variable optimizations is currently typed as
list[fx.Optimization | fx.SegmentedOptimization] but you append a
ClusteredResult instance; update the annotation to include ClusteredResult
(e.g., list[fx.Optimization | fx.SegmentedOptimization | ClusteredResult]) or
use a forward reference/Protocol after the ClusteredResult class definition so
the type checker accepts all three types; alternatively, if structural typing is
desired, replace the concrete union with a compatible Protocol or a more general
type like list[Any] and place the annotation where ClusteredResult is visible.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
flixopt/flow_system.py (2)
1021-1059: Replaceself.coordswithself.indexesinfit_to_model_coordsto avoid triggering deprecation warnings.
coordsis deprecated and emits a DeprecationWarning, butfit_to_model_coords()callsself.coords, which causes the warning to fire internally. Sincecoordsis just a thin wrapper aroundindexes, replace the call directly:Minimal fix
def fit_to_model_coords( self, name: str, data: NumericOrBool | None, dims: Collection[FlowSystemDimensions] | None = None, ) -> xr.DataArray | None: ... - coords = self.coords + coords = self.indexesNote: Internal code in
structure.py(lines 315, 320, 322) also usesself.flow_system.coordsand would benefit from the same change.
737-769: Cluster weights are not restored for clustered FlowSystems loaded from NetCDF.For clustered datasets (
clusterdimension present),from_dataset()explicitly setscluster_weight_for_constructor = None(line 742) and never assignsflow_system.cluster_weightafterwards. Sinceweights['cluster'],temporal_weight, andsum_temporal()all depend onself.cluster_weight, a clustered FlowSystem reloaded from disk will treat all clusters as weight 1.0, ignoring the original cluster weighting. This silently skews rate→total conversions and temporal statistics after any save/load roundtrip.Consider restoring
cluster_weightfrom the Clustering object'srepresentative_weightsafter deserialization, e.g.:# After restoring Clustering object (line ~812) if 'clustering' in reference_structure: # ... existing restoration code ... flow_system.clustering = clustering # Ensure cluster_weight is set from clustering if not already set if flow_system.cluster_weight is None and hasattr(clustering.result, 'representative_weights'): flow_system.cluster_weight = clustering.result.representative_weights
♻️ Duplicate comments (4)
docs/notebooks/08c2-clustering-storage-modes.ipynb (1)
174-181: Previous inconsistencies resolved (N_CLUSTERS comment andcluster_modenaming).The
N_CLUSTERScomment now matches the value (24), and the docs consistently usecluster_mode, aligning with the implementation.Also applies to: 383-397
flixopt/dataset_plot_accessor.py (1)
17-112: Plot accessor API looks consistent;exclude_dimswiring is correct now.
exclude_dimsis accepted and propagated through Dataset and DataArraybar,stacked_bar,line, andareamethods, andassign_slotshandles exclusion as expected. This resolves the earlier API mismatch and silent-parameter issues.Also applies to: 191-258, 325-459, 799-921
flixopt/components.py (2)
1476-1483: Decay calculation uses timestep count instead of hours.The decay factor
(1 - rel_loss) ** timesteps_per_clusterassumes each timestep is 1 hour. Sincerelative_loss_per_houris a per-hour rate, the exponent should be total hours (timesteps × timestep_duration), not just the count of timesteps.🔎 Proposed fix
# Apply self-discharge decay factor (1-loss)^N to soc_before per Eq. 5 # Use mean over time (linking operates at period level, not timestep) # Keep as DataArray to respect per-period/scenario values rel_loss = self.element.relative_loss_per_hour.mean('time') - decay_n = (1 - rel_loss) ** timesteps_per_cluster + # Total hours = timesteps × duration per timestep + timestep_duration = float(self._model.timestep_duration.mean().values) + total_hours = timesteps_per_cluster * timestep_duration + decay_n = (1 - rel_loss) ** total_hours
1536-1541: Same decay bug in combined bound constraints.The decay factor
(1 - rel_loss) ** offsetuses raw timestep index rather than hours. This should be multiplied bytimestep_durationfor correct decay calculation.🔎 Proposed fix
+ # Get timestep duration for decay calculation + timestep_duration = float(self._model.timestep_duration.mean().values) + sample_offsets = [0, timesteps_per_cluster // 2, timesteps_per_cluster - 1] for sample_name, offset in zip(['start', 'mid', 'end'], sample_offsets, strict=False): # ... existing code ... # Apply decay factor (1-loss)^t to SOC_boundary per Eq. 9 - decay_t = (1 - rel_loss) ** offset + decay_t = (1 - rel_loss) ** (offset * timestep_duration) combined = soc_d * decay_t + cs_t
🧹 Nitpick comments (2)
CHANGELOG.md (1)
70-85: Minor doc polish: harmonize example label and consider markdownlint style.
- The clustering example uses
HeatDemand(Q)|fixed_relative_profilewhile notebooks useHeatDemand(Q_th)|fixed_relative_profile; aligning these would reduce copy/paste confusion.- markdownlint is flagging fenced code blocks in this new section (expects indented blocks for this doc style). If you want a clean lint run, either switch these to indented blocks or relax MD046 for the changelog.
Also applies to: 145-177
flixopt/transform_accessor.py (1)
868-871: Consider logging a warning for missing cluster occurrences.The fallback
occurrences.get(c, 1)silently defaults to 1 if a cluster is missing from the occurrences dict. While this is defensive, it could mask issues with the tsam output. Consider adding a debug log when this fallback is used.🔎 Suggested improvement
def _build_cluster_weight_for_key(key: tuple) -> xr.DataArray: occurrences = cluster_occurrences_all[key] - weights = np.array([occurrences.get(c, 1) for c in range(actual_n_clusters)]) + weights = [] + for c in range(actual_n_clusters): + if c not in occurrences: + logger.debug(f'Cluster {c} not found in occurrences for {key}, defaulting to weight=1') + weights.append(occurrences.get(c, 1)) + weights = np.array(weights) return xr.DataArray(weights, dims=['cluster'], coords={'cluster': cluster_coords})
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
CHANGELOG.mddocs/notebooks/08b-rolling-horizon.ipynbdocs/notebooks/08c2-clustering-storage-modes.ipynbdocs/user-guide/optimization/index.mdflixopt/components.pyflixopt/dataset_plot_accessor.pyflixopt/flow_system.pyflixopt/optimization.pyflixopt/transform_accessor.pytests/deprecated/examples/03_Optimization_modes/example_optimization_modes.py
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/user-guide/optimization/index.md
🧰 Additional context used
🧬 Code graph analysis (3)
tests/deprecated/examples/03_Optimization_modes/example_optimization_modes.py (4)
flixopt/clustering/base.py (1)
n_clusters(1048-1053)flixopt/flow_system.py (2)
copy(955-980)transform(1605-1623)flixopt/transform_accessor.py (1)
cluster(578-1067)flixopt/solvers.py (1)
HighsSolver(67-89)
flixopt/transform_accessor.py (5)
flixopt/clustering/base.py (1)
n_clusters(1048-1053)flixopt/flow_system.py (8)
weights(2104-2124)FlowSystem(55-2499)to_dataset(630-697)dims(1936-1956)sel(2288-2314)from_dataset(700-826)solution(1449-1466)solution(1469-1472)flixopt/structure.py (13)
weights(239-244)flow_system(437-455)timestep_duration(220-222)timestep_duration(1749-1750)to_dataset(912-935)add(1282-1291)add(1817-1819)dims(229-231)get(1656-1661)get(1821-1823)from_dataset(967-1000)solution(169-217)solution(1152-1166)flixopt/core.py (4)
TimeSeriesData(34-124)drop_constant_arrays(566-596)is_timeseries_data(112-114)from_dataarray(97-109)flixopt/statistics_accessor.py (3)
sizes(440-444)sizes(1011-1054)sizes(1661-1716)
flixopt/components.py (2)
flixopt/interface.py (1)
InvestParameters(956-1279)flixopt/clustering/intercluster_helpers.py (2)
build_boundary_coords(134-178)extract_capacity_bounds(64-131)
🪛 markdownlint-cli2 (0.18.1)
CHANGELOG.md
74-74: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
121-121: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
149-149: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
189-189: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build documentation
- GitHub Check: test (3.14)
- GitHub Check: test (3.12)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
🔇 Additional comments (17)
tests/deprecated/examples/03_Optimization_modes/example_optimization_modes.py (1)
191-217: Migration to new clustering API looks good.The refactoring successfully replaces the old
ClusteredOptimizationapproach with the newtransform.cluster()API:
- Clustering parameters (n_clusters, cluster_duration) are correctly passed to
cluster().- Peak selection lists (
time_series_for_high_peaks,time_series_for_low_peaks) use proper variable labels matching the dataset.- The
ClusteredResultwrapper provides the minimal interface needed for compatibility with the existing plotting code.Note: The previous review comment regarding the unused
include_storagevariable has been resolved—this variable no longer appears in the code.docs/notebooks/08b-rolling-horizon.ipynb (1)
184-186: LGTM! Markdown artifact fixed.The markdown cell now correctly contains only the header and description text without the
<cell_type>markdown</cell_type>artifact that was flagged in the previous review.flixopt/transform_accessor.py (6)
578-672: Well-documented cluster() method with comprehensive docstring.The docstring clearly explains the purpose, parameters, and usage patterns for the clustering workflow. The two-stage sizing optimization example is particularly helpful for users.
686-738: Comprehensive input validation.Good defensive programming with early validation of timestep uniformity, cluster duration divisibility, and reserved tsam parameter conflicts. The predef_cluster_order dimension validation ensures consistency with the FlowSystem's dimensionality.
789-801: Good resilience with warning suppression and error handling.Suppressing the tsam informational warning and gracefully handling
accuracyIndicators()failures prevents non-critical issues from blocking the clustering workflow.
1372-1440: Correct implementation of SOC expansion with decay and boundary combination.The logic correctly implements Eq. 9 from Blanke et al. (2022): combining SOC_boundary with decayed charge_state. The
clip(min=0)at line 1439 appropriately handles small negative values from constraint approximations while maintaining physical validity.
1069-1210: Helper methods follow consistent pattern for multi-dimensional combination.The three
_combine_slices_to_dataarray*methods share similar logic but serve distinct purposes with different dimension handling. While there's some code duplication, keeping them separate improves clarity and maintainability for the different use cases.
1318-1329: No issues found. The multi-dimensional indexing in the charge_state expansion logic is correct. Whencluster_order.ndim > 1, theda.isel(cluster=last_clusters, time=-1)correctly uses xarray's advanced indexing with broadcasting. The subsequentdrop_varsandexpand_dimsoperations properly maintain dimension alignment for concatenation, as validated by comprehensive test coverage includingtest_expand_solution_maps_scenarios_independently.flixopt/optimization.py (3)
4-9: Clear module-level guidance directing users to the new clustering approach.The docstring update appropriately redirects users from the removed
ClusteredOptimizationtoFlowSystem.transform.cluster().
80-111: Deprecated parameter handling with backwards compatibility.The
normalize_weightsparameter is deprecated but still accepted for backwards compatibility. Settingobj.normalize_weights = Trueunconditionally ensures consistent behavior while the deprecation warning (issued inFlowSystem.create_model()per the relevant snippet) informs users.
186-194: Simplified do_modeling call without normalize_weights.The
create_model()call no longer passesnormalize_weights, aligning with the deprecation of this parameter. The relevant snippet showsFlowSystem.create_model()handles the deprecation warning internally.flixopt/components.py (6)
274-300: Comprehensive cluster_mode documentation in docstring.The updated docstring clearly explains the four cluster modes (
independent,cyclic,intercluster,intercluster_cyclic) with their respective behaviors and use cases. This helps users choose the appropriate mode for their storage modeling needs.
449-476: Clean branching logic for storage model selection.The
create_modelmethod correctly branches toInterclusterStorageModelonly when both clustering is active and the storage is configured for intercluster modes. This ensures the specialized model is only used when appropriate.
924-934: Template method pattern enables clean extensibility.The refactored
_do_modelingmethod orchestrates the modeling sequence through well-named hook methods. This allowsInterclusterStorageModelto override specific behaviors (like_add_cluster_cyclic_constraint) without duplicating code.
1035-1064: Correctly formulated energy balance equation.The
_build_energy_balance_lhsmethod properly implements the energy balance with self-discharge decay usingtimestep_duration, ensuring correct behavior regardless of timestep resolution.
1243-1265: Correct symmetric bounds for ΔE (charge_state) in intercluster mode.The
_absolute_charge_state_boundsoverride correctly provides symmetric bounds[-capacity, +capacity]for the relative charge state variable. The+ 0.0workaround for the linopy LP writer-0.0bug is a reasonable pragmatic fix.
1399-1416: Clean cluster start constraint with 2D structure.With the 2D (cluster, time) structure, constraining
charge_state.isel(time=0) == 0elegantly broadcasts across all clusters. The docstring clearly explains this simplification.
|
|
||
| # Optimizations | ||
| optimizations: list[fx.Optimization | fx.ClusteredOptimization | fx.SegmentedOptimization] = [] | ||
| optimizations: list[fx.Optimization | fx.SegmentedOptimization] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update type annotation to include ClusteredResult.
The type hint specifies list[fx.Optimization | fx.SegmentedOptimization], but line 217 appends a ClusteredResult instance that doesn't match either type. Type checkers will flag this as an error.
🔎 Proposed fix
Since ClusteredResult is defined locally and duck-types the needed interface, consider broadening the type hint to acknowledge all three types:
- optimizations: list[fx.Optimization | fx.SegmentedOptimization] = []
+ optimizations: list[fx.Optimization | fx.SegmentedOptimization | ClusteredResult] = []Note: This requires moving the type annotation after the ClusteredResult class definition, or using a forward reference. Alternatively, use a protocol or structural type if appropriate.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In
@tests/deprecated/examples/03_Optimization_modes/example_optimization_modes.py
around line 178, The variable optimizations is currently typed as
list[fx.Optimization | fx.SegmentedOptimization] but you append a
ClusteredResult instance; update the annotation to include ClusteredResult
(e.g., list[fx.Optimization | fx.SegmentedOptimization | ClusteredResult]) or
use a forward reference/Protocol after the ClusteredResult class definition so
the type checker accepts all three types; alternatively, if structural typing is
desired, replace the concrete union with a compatible Protocol or a more general
type like list[Any] and place the annotation where ClusteredResult is visible.
| class ClusteredResult: | ||
| def __init__(self, name, fs): | ||
| self.name = name | ||
| self.flow_system = fs | ||
| self.durations = {'total': 0} # Placeholder | ||
|
|
||
| optimization = ClusteredResult('Clustered', clustered_fs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placeholder durations produces misleading speed comparison.
The durations attribute is hardcoded to {'total': 0}, which will show the Clustered optimization taking 0 seconds in the speed comparison plot (lines 252-258). This misrepresents the actual clustering and optimization time.
💡 Suggested improvement
If timing information is important for comparison, consider tracking the actual duration:
import time
# Before clustering
start_time = time.time()
clustered_fs = flow_system.copy().transform.cluster(
n_clusters=n_clusters,
cluster_duration=cluster_duration,
time_series_for_high_peaks=time_series_for_high_peaks,
time_series_for_low_peaks=time_series_for_low_peaks,
)
clustered_fs.optimize(fx.solvers.HighsSolver(0.01 / 100, 60))
clustering_duration = time.time() - start_time
class ClusteredResult:
def __init__(self, name, fs, duration):
self.name = name
self.flow_system = fs
self.durations = {'total': duration}
optimization = ClusteredResult('Clustered', clustered_fs, clustering_duration)Alternatively, if timing is not critical for this deprecated example, consider adding a comment noting that the duration is a placeholder.
🤖 Prompt for AI Agents
In
@tests/deprecated/examples/03_Optimization_modes/example_optimization_modes.py
around lines 210-216, The ClusteredResult class sets durations to the hardcoded
{'total': 0}, which misleads the speed comparison; wrap the clustering and
subsequent optimize call in a timer (e.g., using time.time()), compute the
elapsed time, modify ClusteredResult.__init__ to accept a duration parameter and
set self.durations = {'total': duration}, and instantiate optimization =
ClusteredResult('Clustered', clustered_fs, clustering_duration); alternatively,
if you don't want to measure time here, replace the hardcoded durations with a
clear comment that the value is a placeholder so the plot isn't misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
flixopt/flow_system.py (1)
679-692: Consider consolidating clustering serialization.Both old (attrs-based, lines 679-685) and new (object-based, lines 686-692) clustering serialization structures coexist. If the old structure is only for backward compatibility, consider adding a comment explaining when it can be removed, or verify if both are actually needed.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
flixopt/flow_system.pyflixopt/structure.py
🧰 Additional context used
🧬 Code graph analysis (1)
flixopt/flow_system.py (1)
flixopt/clustering/base.py (4)
Clustering(962-1117)_register_clustering_classes(1170-1179)clusters(861-958)timesteps_per_cluster(1071-1075)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.14)
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: Build documentation
🔇 Additional comments (15)
flixopt/structure.py (7)
171-179: Good practice: Targeted warning suppression.The warning suppression is appropriately scoped to only the
super().solutioncall and targets a specific UserWarning message. This prevents masking unrelated warnings while handling the expected coordinate mismatch for storage charge states.
214-217: Good optimization: Conditional reindexing.The additional check
not solution.indexes['time'].equals(self.flow_system.timesteps_extra)prevents unnecessary reindexing when the time index is already correct, improving performance.
220-265: LGTM: Clean public API delegation.The new properties (
timestep_duration,dims,indexes,weights,temporal_dims,temporal_weight) andsum_temporalmethod provide a clean public API by delegating to the underlyingFlowSystem. This improves consistency and supports the new clustering functionality.
268-290: LGTM: Simplified weighting logic.The scenario weighting logic is now cleaner with normalization handled in
FlowSystem. Theobjective_weightscalculation is straightforward multiplication, and the docstrings clearly document the normalization behavior.
292-327: Good enhancement: Cluster-aware coordinate handling.The automatic pairing of
'time'and'cluster'dimensions (lines 317-321) ensures correct coordinate behavior for clustered FlowSystems. The clear comment and updated docstring make the behavior explicit.
1749-1750: LGTM: Consistent delegation pattern.The
timestep_durationproperty onSubmodelfollows the established pattern of delegating to the underlying model, providing convenient access for submodel implementations.
95-95: No backward compatibility issue:normalize_weightsremoval is internal refactoring with proper public API deprecation.The
normalize_weightsparameter was removed fromFlowSystemModel.__init__()as an internal implementation change. The public API methods (FlowSystem.build_model(),FlowSystem.optimize(), etc.) still accept this parameter as deprecated and issue warnings. No code in the repository directly callsFlowSystemModelwith this parameter, and existing tests verify that the deprecated public API methods continue to work correctly.flixopt/flow_system.py (8)
179-215: LGTM: Clean cluster parameter handling.The
clustersandcluster_weightparameters are added consistently with existing dimension parameters. Thecluster_weightis properly transformed viafit_to_model_coordswith appropriate None handling.
512-583: Good edge case handling: Dimension cleanup.Both
_update_period_metadataand_update_scenario_metadatanow properly handle the case where dimensions are dropped (single value selected). The cleanup of scalar coordinates, weight DataArrays, and attributes prevents stale metadata.
737-821: LGTM: Proper clustering deserialization.The clustering deserialization correctly handles:
- Conditional
cluster_weightfor clustered datasets (lines 740-748)- Scenario weights only when scenario dimension exists (lines 750-753)
- Clustering object reconstruction from prefixed arrays (lines 810-821)
The separation between constructor parameters and post-construction attributes is well structured.
1333-1391: Good deprecation handling fornormalize_weights.Both
create_modelandbuild_modelaccept the deprecatednormalize_weightsparameter and emit clear deprecation warnings. This provides a smooth migration path for existing code while centralizing normalization in FlowSystem.
1936-2160: Excellent new public API for clustering and weighting.The new properties provide a comprehensive and well-documented API:
dims/indexescleanly replace deprecatedcoords(with proper deprecation warning)temporal_weightandsum_temporalprovide correct weighted aggregation for clustered systemsweightsgives unified access to all dimension weightsis_clusteredprovides clear state inspection- All include docstrings with examples
2063-2091: LGTM: Robust scenario weight normalization.The setter now handles normalization centrally with proper error checking:
- Uses
np.isclosefor robust zero-sum detection- Clear error message when normalization fails
- Normalizes across the scenario dimension
- Aligns with the removal of
normalize_weightsparameter
196-203: No backward compatibility concerns. Thehours_per_stepvariable was never exposed as a public attribute—it only exists as local variables within methods and function parameters in internal utilities. External code accesses the duration viaflow_system.model.timestep_duration, which is the new, consistent terminology throughout the codebase.
41-50: Module-level registration pattern correctly handles circular imports.The circular import between
flow_systemandclustering/intercluster_helpers(which importsFlowSystem) is properly managed by deferring the_register_clustering_classes()call to the end of the module, after primary imports. This is the correct approach for resolving such dependencies in Python. TheClusteringtype hint is isolated in theTYPE_CHECKINGblock, and the clustering registration import follows all other module-level code, ensuring no premature initialization of clustering modules occurs beforeflow_systemis ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Fix all issues with AI Agents 🤖
In @docs/notebooks/data/generate_example_systems.py:
- Around line 74-79: The timestamp alignment using _weather.loc[timesteps,
'temperature_C'] is fragile and can KeyError on missing or misaligned
timestamps; replace the .loc access with a reindex-based lookup such as
retrieving temperature via _weather.reindex(timesteps)['temperature_C'].values
(or _weather.reindex(timesteps, method=...) if forward/backfill is desired) to
safely align timesteps to _weather, and propagate this same reindex pattern to
other places that currently use .loc[timesteps, ...] (e.g., where temp or other
columns are derived) to avoid intermittent KeyError issues.
- Around line 56-59: Module-level calls to load_weather() and
load_electricity_prices() can raise errors at import time and the current
.tz_localize(None) hack strips timezone info; wrap the load calls in a
try/except that catches exceptions from load_weather and
load_electricity_prices, re-raise a clear RuntimeError/ValueError with context
mentioning _weather and _elec_prices and the failing loader names, or convert
these to a lazy initializer function that loads on demand; also stop stripping
tz with tz_localize(None)—instead normalize the electricity prices to a
consistent timezone (e.g., use _elec_prices.tz_convert("UTC") or ensure other
data is timezone-aware) so comparisons remain correct.
- Around line 406-411: The 15-minute timesteps variable is being used to index
hourly _weather, causing KeyError; fix by upsampling or reindexing _weather to
match timesteps before reading temperature: create an upsampled temperature
series from _weather (e.g., reindex or resample to timesteps with forward-fill)
and then set temp = upsampled.loc[timesteps, 'temperature_C'].values so
thermal_gen.generate receives 15‑minute resolution temperature data.
🧹 Nitpick comments (2)
docs/notebooks/data/generate_example_systems.py (2)
727-733: Add error handling for system generation.The main loop doesn't handle errors from
create_func()orfs.to_netcdf(). If one system fails to generate or save, the entire script stops without completing the remaining systems.🔎 Proposed improvement
for name, create_func in systems: print(f'Creating {name}...') - fs = create_func() - - output_path = OUTPUT_DIR / f'{name}.nc4' - print(f' Saving to {output_path}...') - fs.to_netcdf(output_path, overwrite=True) + try: + fs = create_func() + output_path = OUTPUT_DIR / f'{name}.nc4' + print(f' Saving to {output_path}...') + fs.to_netcdf(output_path, overwrite=True) + print(f' ✓ {name} completed') + except Exception as e: + print(f' ✗ {name} failed: {e}') + continue print('All systems generated successfully!')
62-713: Consider helper functions to reduce duplication.Several patterns are repeated across the generator functions:
- Temperature data loading and alignment
- BDEW profile generation (thermal and electrical)
- Price data preparation (electricity and gas)
Extracting these into helper functions could improve maintainability and consistency, especially if the data loading logic needs to be updated later.
Example helper:
def get_temperature_profile(timesteps: pd.DatetimeIndex) -> np.ndarray: """Get temperature profile aligned to timesteps.""" return _weather['temperature_C'].reindex(timesteps, method='ffill').values def get_electricity_price_profile(timesteps: pd.DatetimeIndex) -> np.ndarray: """Get electricity price profile aligned to timesteps.""" return _elec_prices.reindex(timesteps, method='ffill').values / 1000 # EUR/kWhHowever, for tutorial/example code, keeping operations explicit in each function may aid understanding.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/notebooks/08b-rolling-horizon.ipynbdocs/notebooks/data/generate_example_systems.pyflixopt/components.pyflixopt/flow_system.py
🧰 Additional context used
🧬 Code graph analysis (1)
docs/notebooks/data/generate_example_systems.py (4)
docs/notebooks/data/generate_realistic_profiles.py (8)
ElectricityLoadGenerator(128-170)GasPriceGenerator(234-262)ThermalLoadGenerator(68-125)load_electricity_prices(53-62)load_weather(37-50)generate(87-125)generate(143-170)generate(237-262)flixopt/structure.py (2)
values(1527-1532)values(1814-1815)flixopt/effects.py (1)
Effect(34-305)flixopt/carrier.py (1)
Carrier(14-125)
🔇 Additional comments (15)
flixopt/components.py (6)
449-476: LGTM! Clean separation of storage model selection based on clustering.The create_model method now cleanly delegates to InterclusterStorageModel for intercluster modes when clustering is active, with clear documentation explaining the model selection logic.
1476-1485: Excellent fix! Decay calculation now correctly uses hours.The linking constraint decay calculation has been corrected from the previous review. Line 1481 now properly converts timesteps to hours:
hours_per_cluster = timesteps_per_cluster * self._model.timestep_duration.mean('time')This ensures the decay exponent is in hours, making the calculation correct for sub-hourly timesteps (e.g., 15-minute resolution).
1540-1544: Excellent fix! Combined bounds decay now correctly uses hours.The combined bounds constraint decay has been corrected from the previous review. Line 1543 now properly converts timestep offset to hours:
hours_offset = offset * mean_timestep_durationThis matches the fix in the linking constraints and ensures consistent decay calculations throughout the intercluster storage model.
924-1033: LGTM! Well-structured template method pattern.The StorageModel refactor introduces clean template methods that enable subclass customization without code duplication. The orchestration in
_do_modeling()is clear, and the new_add_cluster_cyclic_constraint()and_add_initial_final_constraints()methods properly handle clustered storage modes.
1147-1305: LGTM! Well-documented and mathematically sound intercluster storage model.The InterclusterStorageModel implementation is excellent:
- Clear docstring explaining the S-N linking model with academic references
- Symmetric bounds for ΔE (relative charge state) are mathematically correct
- Proper use of template method pattern to customize behavior
- Clean separation of concerns with dedicated helper methods
The implementation aligns well with the Blanke et al. (2022) approach for seasonal storage in clustered optimizations.
1306-1556: LGTM! Complete and correct implementation of intercluster linking.The helper methods implement the S-N linking model correctly:
- Cluster start constraints (ΔE=0) establish the reference point
- Delta SOC computation tracks net charge/discharge per cluster
- Linking constraints connect consecutive boundaries with decay
- Combined bounds ensure physical SOC limits are respected
The warning suppression at lines 1536-1538 is appropriate for a known xarray behavior when renaming dimensions.
flixopt/flow_system.py (4)
810-827: LGTM! Correct restoration of cluster weights from clustering metadata.The clustering restoration logic correctly reconstructs the Clustering object from serialized data and critically restores
cluster_weightfromclustering.result.representative_weightsat line 826. This ensures that temporal weights are properly restored when loading clustered FlowSystems from NetCDF, which is essential for correct statistics calculations.
1338-1391: LGTM! Clean deprecation of normalize_weights parameter.The deprecation warnings in both
create_model()andbuild_model()clearly communicate thatnormalize_weightsis no longer needed since scenario weights are now always normalized when set on the FlowSystem. This simplifies the API while maintaining backward compatibility.
2068-2096: LGTM! Scenario weights are now automatically normalized.The scenario_weights setter correctly:
- Converts weights to DataArray with proper coordinates (line 2090)
- Validates that weights don't sum to zero (line 2094)
- Normalizes to sum to 1.0 (line 2096)
This automatic normalization simplifies the API and eliminates the need for the deprecated
normalize_weightsparameter.
1941-2165: LGTM! Comprehensive clustering support through new properties.The new properties and methods provide clean access to clustering metadata:
dims,indexes: Consistent dimension/coordinate accesstemporal_dims,temporal_weight: Proper handling of time + cluster dimensionsweights: Unified access to all dimension weights with sensible defaultssum_temporal(): Convenient helper for temporal aggregationis_clustered: Clear indicator of clustering stateThese additions create a cohesive API for working with clustered FlowSystems.
docs/notebooks/08b-rolling-horizon.ipynb (3)
66-73: Excellent fix! Timestep-to-days calculation is now correct.The notebook now correctly calculates days from 15-minute timesteps:
- Division by 96 (24 hours × 4 intervals/hour) is accurate for 15-min data
- Resolution description updated to "15-min resolution"
For 1344 timesteps, this now correctly reports 14 days instead of the previous incorrect 56 days.
54-98: LGTM! Improved workflow using FlowSystem-centric approach.The notebook now uses a cleaner workflow:
create_operational_system()replaces manual CSV loadingflow_system.connect_and_transform()aligns all data as xarray- FlowSystem names (
fs_full.name,fs_rolling.name) enable better trackingThis aligns with the modern flixopt API and makes the example more maintainable.
183-198: LGTM! Clean use of Comparison class for visualization.The notebook demonstrates the new
Comparisonclass for side-by-side visualization:comp = fx.Comparison([fs_full, fs_rolling]) comp.statistics.plot.balance('Heat')This is a great example of multi-system comparison capabilities and provides clear visual feedback on rolling horizon vs. full optimization.
docs/notebooks/data/generate_example_systems.py (2)
515-535: LGTM! Correct handling of full-year seasonal data.This function correctly handles the full-year data by:
- Using 2019 (non-leap year) to match the 8760-hour TMY weather data
- Taking all weather values directly without timestamp alignment issues
- Explicitly passing
year=2019to ThermalLoadGeneratorThe seasonal storage configuration with
initial_charge_state='equals_final'and very low loss rate (0.01%/hour) appropriately demonstrates seasonal pit storage.
139-143: Theannual_demand_kwh=2_000value is intentional and appropriate for this context. The parameter is a scaling factor for the BDEW profile generator over a 3-day demonstration scenario, not an actual annual demand value. The comment in the code explicitly states "BDEW demand profiles (scaled to fit component sizes)", indicating these values are chosen to create an appropriately-scaled example system, not to represent realistic annual consumption.Likely an incorrect or invalid review comment.
dc05864 to
348c0c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/test_on_hours_computation.py (1)
12-44: Align test parameter name with function signature.The test uses
timestep_durationas the parameter name, butModelingUtilities.compute_consecutive_hours_in_state()defines the parameter ashours_per_timestep(flixopt/modeling.py:122). While the positional argument passing works correctly, rename the test parameter to match the function signature for consistency and clarity.tests/deprecated/test_on_hours_computation.py (1)
12-44: Parameter name inconsistency between test and function signature.The test parameter
timestep_durationdoesn't match the function parameterhours_per_timestepinModelingUtilities.compute_consecutive_hours_in_state()(flixopt/modeling.py). However, since the argument is passed positionally, the tests will not fail. Consider renaming the test parameter tohours_per_timestepfor consistency and clarity.
🤖 Fix all issues with AI Agents
In @docs/notebooks/08a-aggregation.ipynb:
- Around line 50-52: The notebook currently treats the data as hourly; update
all day/week calculations to 15‑minute resolution by using 96 timesteps per day
and 672 timesteps per week (7*24*4). Specifically, replace any "/ 24" day-size
divisions with "/ 96", change any "hourly resolution" text to "15‑min
resolution", and update any slices like "data[:168]" or similar first-week
indices to use 672 (or a computed variable timesteps_per_week = 7*24*4); apply
these fixes in the cells referenced (around the days print and the “first week”
plot and cells covering lines 62-69 and 79-91).
In @docs/notebooks/data/generate_example_systems.py:
- Around line 74-79: The code is using _weather.loc[timesteps, 'temperature_C']
which will KeyError if timestamps don't match exactly; change this to reindex
the weather series against timesteps (e.g., get the 'temperature_C' column via
_weather.reindex(timesteps)['temperature_C'] or
_weather['temperature_C'].reindex(timesteps)) before taking .values so missing
timestamps are handled robustly, and apply the same reindex pattern wherever
.loc[timesteps, ...] is used (including around temp, timesteps, and the call to
ThermalLoadGenerator.generate).
In @docs/user-guide/optimization/clustering.md:
- Around line 158-160: The fenced code block showing the formula for
actual_SOC(t) lacks a language specifier; edit the block containing
"actual_SOC(t) = SOC_boundary[period] × (1 - loss)^t + ΔE(t)" and add an
appropriate language tag (e.g., ```text or ```math) after the opening backticks
so markdownlint passes and rendering is improved; do this for that fenced block
only where actual_SOC(t), SOC_boundary[period], and ΔE(t) appear.
- Around line 53-54: The table's inline code spans contain unescaped pipe
characters which break the Markdown table; update the two entries to escape the
pipe inside the code spans by changing 'HeatDemand(Q)|fixed_relative_profile' to
'HeatDemand(Q)\|fixed_relative_profile' and 'SolarGen(P)|fixed_relative_profile'
to 'SolarGen(P)\|fixed_relative_profile' so the backticked values remain single
cells.
In @flixopt/clustering/base.py:
- Around line 861-959: clusters() wrongly assumes aggregated_data[var].values is
a flat 1-D array and unconditionally reshapes it; instead detect the array shape
and handle cases: if the DataArray already has dims ('cluster','time') use it
as-is (preserving coords), if it's 1-D verify its length equals
n_clusters*timesteps_per_cluster then reshape into (n_clusters,
timesteps_per_cluster), and if it is multi-dimensional (e.g., contains extra
dims like 'period' or 'scenario') either select/stack into the required 2-D
layout or raise a clear ValueError; update the loop in clusters() (referencing
aggregated_data, resolved_variables, n_clusters, timesteps_per_cluster and
data_vars) to branch on arr.ndim/arr.shape and preserve xarray coords when
building the xr.DataArray instead of always using arr.values.reshape(...).
In @flixopt/clustering/intercluster_helpers.py:
- Around line 64-131: The function extract_capacity_bounds incorrectly uses
hasattr checks to detect InvestParameters and sets has_investment wrongly;
replace all hasattr(capacity_param, '...') checks with
isinstance(capacity_param, InvestParameters), set has_investment =
isinstance(capacity_param, InvestParameters), then pick cap_value as: if
isinstance(capacity_param, InvestParameters) and capacity_param.fixed_size is
not None use fixed_size, elif isinstance(capacity_param, InvestParameters) and
capacity_param.maximum_size is not None use maximum_size, elif
isinstance(capacity_param, (int, float)) use the numeric value, else use a safer
unbounded fallback (reduce 1e9 to something in the 1e4–1e6 range, e.g. 1e6);
keep the existing xr.DataArray handling for cap_value and return signature
unchanged.
In @flixopt/config.py:
- Around line 166-168: The dim_priority default tuple in CONFIG.Plotting
includes 'duration' and 'cluster', which are not actual dataset dimension names;
remove those entries from the default dim_priority (update the tuple assigned to
CONFIG.Plotting.dim_priority) so it only lists real dimension names, or if they
are intended for future use, add a comment documenting why they remain; also
verify the filtering code that builds priority_dims (the list comprehension for
CONFIG.Plotting.dim_priority against available) remains intact to guard against
any other mismatches.
🧹 Nitpick comments (11)
.github/workflows/docs.yaml (3)
60-66: Consider making the cache key generation more robust.The hash generation command chains multiple
findcommands withxargs cat. While functional, this approach could be fragile:
- If
findencounters a file it cannot read, the entire pipeline could fail silently- The
catof all files could consume significant memory for large repositories- No explicit error handling if commands fail
🔎 More robust cache key generation
- - name: Get notebook cache key - id: notebook-cache-key - run: | - # Hash notebooks + flixopt source code (sorted for stable cache keys) - HASH=$(find docs/notebooks -name '*.ipynb' | sort | xargs cat | cat - <(find flixopt -name '*.py' | sort | xargs cat) | sha256sum | cut -d' ' -f1) - echo "hash=$HASH" >> $GITHUB_OUTPUT + - name: Get notebook cache key + id: notebook-cache-key + run: | + # Hash notebooks + flixopt source code (sorted for stable cache keys) + HASH=$(find docs/notebooks -name '*.ipynb' -type f -print0 | sort -z | xargs -0 sha256sum | \ + cat - <(find flixopt -name '*.py' -type f -print0 | sort -z | xargs -0 sha256sum) | \ + sha256sum | cut -d' ' -f1) + echo "hash=$HASH" >> $GITHUB_OUTPUTThis alternative:
- Uses
-type fto ensure only regular files are processed- Uses null-terminated strings (
-print0/-0) to handle filenames with spaces- Hashes files individually first, then combines hashes (more memory-efficient)
74-80: Consider adding error handling for notebook execution.The parallel notebook execution doesn't have explicit error handling. If a notebook fails to execute, the workflow should fail rather than proceeding with partially executed notebooks.
🔎 Add error handling to notebook execution
- name: Execute notebooks in parallel if: steps.notebook-cache.outputs.cache-hit != 'true' run: | - # Execute all notebooks in parallel (4 at a time) - # Run from notebooks directory so relative imports work - cd docs/notebooks && find . -name '*.ipynb' -print0 | \ - xargs -0 -P 4 -I {} jupyter execute --inplace {} + # Execute all notebooks in parallel (4 at a time) + # Run from notebooks directory so relative imports work + set -e # Exit on first error + cd docs/notebooks + FAILED=0 + find . -name '*.ipynb' -print0 | \ + xargs -0 -P 4 -I {} sh -c 'jupyter execute --inplace "{}" || exit 255' || FAILED=$? + if [ $FAILED -ne 0 ]; then + echo "::error::One or more notebooks failed to execute" + exit 1 + fi
122-141: Duplicate cache logic could be refactored into a composite action.The same three steps (get cache key, cache notebooks, execute notebooks) are duplicated between the
buildanddeployjobs. This duplication increases maintenance burden.Consider extracting these steps into a reusable composite action (e.g.,
.github/actions/setup-notebooks/action.yaml) to reduce duplication and ensure consistency. However, this can be deferred if the current duplication is acceptable for your workflow maintenance approach.docs/notebooks/data/generate_example_systems.py (1)
135-149: Inconsistent timestamp alignment patterns.Line 136 uses the fragile
.loc[timesteps, ...]pattern, while Line 146 correctly uses.reindex(timesteps, method='ffill'). For consistency and robustness, use.reindex()throughout.🔎 Suggested fix for consistency
- temp = _weather.loc[timesteps, 'temperature_C'].values + temp = _weather['temperature_C'].reindex(timesteps, method='ffill').valuesdocs/user-guide/results/index.md (1)
280-428: Excellent documentation for the new Comparison feature!The documentation clearly explains:
- When and why to use Comparison
- Requirements for matching dimensions
- Available properties and methods
- How to compute differences
The examples are well-structured and cover common use cases. The explanation of core vs. auxiliary dimensions (lines 315-332) is particularly helpful.
Minor suggestion for clarity:
Consider adding a note about what happens if systems have different components. For example:
!!! note "Component Differences" Systems can have different components. The Comparison will align data where possible, and variables unique to specific systems will be NaN for others.flixopt/clustering/intercluster_helpers.py (1)
108-118: Consider validating the unbounded default value.When
capacity_paramisNoneor doesn't have sizing attributes, the function defaults tocap_value = 1e9. This is a large value, but for certain problem formulations, it might still be limiting or cause numerical issues.Consider:
- Adding a module-level constant for this default with documentation
- Logging a warning when the unbounded case is used
- Documenting in the function docstring that users should provide explicit bounds
Optional: Add constant and validation
+# Default upper bound for unbounded storage (1 TWh equivalent at 1 hour resolution) +UNBOUNDED_STORAGE_DEFAULT = 1e9 + def extract_capacity_bounds( capacity_param: InvestParameters | int | float | None, boundary_coords: dict, boundary_dims: list[str], ) -> CapacityBounds: """Extract capacity bounds from storage parameters for SOC_boundary variable. ... + + Warning: + If capacity_param is None, a default upper bound of 1e9 is used. + For unbounded storage, consider providing an explicit maximum_size. """ n_boundaries = len(boundary_coords['cluster_boundary']) lb_shape = [n_boundaries] + [len(boundary_coords[d]) for d in boundary_dims[1:]] lb = xr.DataArray(np.zeros(lb_shape), coords=boundary_coords, dims=boundary_dims) # Determine has_investment and cap_value has_investment = hasattr(capacity_param, 'maximum_size') if hasattr(capacity_param, 'fixed_size') and capacity_param.fixed_size is not None: cap_value = capacity_param.fixed_size elif hasattr(capacity_param, 'maximum_size') and capacity_param.maximum_size is not None: cap_value = capacity_param.maximum_size elif isinstance(capacity_param, (int, float)): cap_value = capacity_param else: - cap_value = 1e9 # Large default for unbounded case + cap_value = UNBOUNDED_STORAGE_DEFAULT + import logging + logging.getLogger('flixopt').warning( + f'Using default upper bound {UNBOUNDED_STORAGE_DEFAULT} for unbounded storage. ' + 'Consider providing explicit maximum_size for better numerical stability.' + )docs/notebooks/09-plotting-and-data-access.ipynb (1)
62-78: Consider adding error handling for system generation failures.The notebook creates and optimizes three systems in a single cell. If any generation or optimization step fails, the entire cell fails, and subsequent cells won't execute. Consider adding basic error handling or splitting into separate cells for better debugging during notebook execution.
Optional: Add error handling or split cells
Split the generation into individual cells or add error handling:
# Create and optimize the example systems solver = fx.solvers.HighsSolver(mip_gap=0.01, log_to_console=False) try: simple = create_simple_system() simple.optimize(solver) complex_sys = create_complex_system() complex_sys.optimize(solver) multiperiod = create_multiperiod_system() multiperiod.optimize(solver) print('Created systems:') print(f' simple: {len(simple.components)} components, {len(simple.buses)} buses') print(f' complex_sys: {len(complex_sys.components)} components, {len(complex_sys.buses)} buses') print(f' multiperiod: {len(multiperiod.components)} components, dims={dict(multiperiod.solution.sizes)}') except Exception as e: print(f'Error creating systems: {e}') raisedocs/notebooks/08d-clustering-multiperiod.ipynb (1)
323-335: Clarify the rationale for the 10% safety margin.The notebook applies a 10% safety margin to sizes from clustered optimization. While the text mentions "multi-period uncertainty," it would be helpful to explicitly state that this margin compensates for:
- Potential underestimation of peak demands in clustered data
- Loss of temporal detail in representative periods
- Scenario averaging effects
This helps users understand when to adjust the margin.
Optional: Enhance safety margin documentation
# Stage 1 already done - apply safety margin # The safety margin compensates for: # - Peak demands that may be smoothed by clustering # - Loss of inter-day temporal patterns # - Scenario probability weighting effects SAFETY_MARGIN = 1.10 # 10% buffer for multi-period uncertainty sizes_with_margin = {name: size.max().item() * SAFETY_MARGIN for name, size in fs_clustered.statistics.sizes.items()}flixopt/results.py (1)
101-103: timestep_duration integration into Results and flow_hours looks correctHooking
Results.timestep_durationup viaFlowSystem.calculate_timestep_duration(self.timesteps_extra)and then using it consistently in:
flow_hours()(deprecated helper),node_balance(unit_type='flow_hours'),plot_node_balance_pie(by scaling inputs/outputs),FlowResults.flow_hours,brings Results in line with the new timing semantics while remaining backward compatible. Broadcasting over extra dims (period/scenario/cluster) via the shared
'time'coordinate is appropriate. I don’t see correctness issues in these updates.Also applies to: 287-291, 616-679, 1577-1588, 1717-1719, 2018-2019
flixopt/flow_system.py (1)
2085-2091: Consider edge case handling for zero-sum validation.The normalization logic correctly ensures scenario weights sum to 1. However, the zero-sum check at line 2089 uses
np.isclose(norm, 0.0).any(), which will trigger if any coordinate along other dimensions (like period) has zero sum.Verify this behavior is correct for multi-dimensional scenario_weights. If scenario_weights can have period dimensions, the error message "scenario_weights sum to 0; cannot normalize" might be ambiguous about which period has the issue.
🔎 Consider more descriptive error message for multi-dimensional weights
# Normalize to sum to 1 norm = weights.sum('scenario') -if np.isclose(norm, 0.0).any(): - raise ValueError('scenario_weights sum to 0; cannot normalize.') +if np.isclose(norm, 0.0).any(): + zero_coords = norm.where(np.isclose(norm, 0.0), drop=True) + raise ValueError(f'scenario_weights sum to 0 at coordinates: {zero_coords.coords}; cannot normalize.') self._scenario_weights = weights / normflixopt/comparison.py (1)
23-42: Dimension validation relies on dim names only; consider also checking coordinate compatibility
_validate_matching_dimensionscurrently enforces that all comparedFlowSystem.solutionobjects share the same core dimension names (time/period/scenario), but not that their coordinate sets or lengths actually match. xarray will align on coordinates during concat and arithmetic, but if users accidentally compare systems with misaligned timelines or periods, they may silently get lots ofNaNs instead of a clear error.If you want stricter behavior, you could optionally validate that core dims have identical sizes and coordinates (or at least log/raise when they differ), while keeping the current behavior as a “relaxed” mode.
Also applies to: 70-124
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
docs/notebooks/data/raw/electricity_prices_de_2020.csvis excluded by!**/*.csvdocs/notebooks/data/raw/tmy_dresden.csvis excluded by!**/*.csv
📒 Files selected for processing (76)
.github/workflows/docs.yaml.pre-commit-config.yamlCHANGELOG.mddocs/notebooks/02-heat-system.ipynbdocs/notebooks/03-investment-optimization.ipynbdocs/notebooks/04-operational-constraints.ipynbdocs/notebooks/05-multi-carrier-system.ipynbdocs/notebooks/06a-time-varying-parameters.ipynbdocs/notebooks/07-scenarios-and-periods.ipynbdocs/notebooks/08a-aggregation.ipynbdocs/notebooks/08b-rolling-horizon.ipynbdocs/notebooks/08c-clustering.ipynbdocs/notebooks/08c2-clustering-storage-modes.ipynbdocs/notebooks/08d-clustering-multiperiod.ipynbdocs/notebooks/08e-clustering-internals.ipynbdocs/notebooks/09-plotting-and-data-access.ipynbdocs/notebooks/data/__init__.pydocs/notebooks/data/generate_example_systems.pydocs/notebooks/data/generate_realistic_profiles.pydocs/notebooks/data/raw/README.mddocs/notebooks/data/tutorial_data.pydocs/notebooks/fxplot_accessor_demo.ipynbdocs/user-guide/optimization/clustering.mddocs/user-guide/optimization/index.mddocs/user-guide/recipes/plotting-custom-data.mddocs/user-guide/results-plotting.mddocs/user-guide/results/index.mdflixopt/__init__.pyflixopt/clustering.pyflixopt/clustering/__init__.pyflixopt/clustering/base.pyflixopt/clustering/intercluster_helpers.pyflixopt/comparison.pyflixopt/components.pyflixopt/config.pyflixopt/core.pyflixopt/dataset_plot_accessor.pyflixopt/effects.pyflixopt/elements.pyflixopt/features.pyflixopt/flow_system.pyflixopt/io.pyflixopt/optimization.pyflixopt/optimize_accessor.pyflixopt/plot_result.pyflixopt/results.pyflixopt/statistics_accessor.pyflixopt/structure.pyflixopt/transform_accessor.pymkdocs.ymlpyproject.tomltests/deprecated/examples/03_Optimization_modes/example_optimization_modes.pytests/deprecated/test_bus.pytests/deprecated/test_effect.pytests/deprecated/test_flow.pytests/deprecated/test_flow_system_resample.pytests/deprecated/test_integration.pytests/deprecated/test_linear_converter.pytests/deprecated/test_on_hours_computation.pytests/deprecated/test_scenarios.pytests/deprecated/test_storage.pytests/test_bus.pytests/test_cluster_reduce_expand.pytests/test_clustering/__init__.pytests/test_clustering/test_base.pytests/test_clustering/test_integration.pytests/test_clustering_io.pytests/test_effect.pytests/test_flow.pytests/test_flow_system_resample.pytests/test_io_conversion.pytests/test_linear_converter.pytests/test_on_hours_computation.pytests/test_scenarios.pytests/test_sel_isel_single_selection.pytests/test_storage.py
💤 Files with no reviewable changes (2)
- flixopt/clustering.py
- flixopt/effects.py
✅ Files skipped from review due to trivial changes (1)
- flixopt/plot_result.py
🚧 Files skipped from review as they are similar to previous changes (22)
- mkdocs.yml
- tests/deprecated/test_linear_converter.py
- tests/test_scenarios.py
- docs/notebooks/fxplot_accessor_demo.ipynb
- tests/test_clustering/test_base.py
- flixopt/clustering/init.py
- pyproject.toml
- tests/test_linear_converter.py
- docs/notebooks/data/init.py
- .pre-commit-config.yaml
- tests/deprecated/test_flow_system_resample.py
- tests/test_effect.py
- docs/user-guide/results-plotting.md
- flixopt/io.py
- tests/test_flow.py
- tests/deprecated/test_flow.py
- tests/test_flow_system_resample.py
- flixopt/core.py
- docs/notebooks/08e-clustering-internals.ipynb
- docs/notebooks/08c2-clustering-storage-modes.ipynb
- flixopt/features.py
- tests/test_clustering/init.py
🧰 Additional context used
🧬 Code graph analysis (19)
flixopt/clustering/intercluster_helpers.py (2)
flixopt/flow_system.py (3)
FlowSystem(55-2499)coords(2001-2019)dims(1936-1956)flixopt/interface.py (1)
InvestParameters(956-1279)
tests/test_on_hours_computation.py (2)
tests/deprecated/test_on_hours_computation.py (2)
test_compute_duration(29-32)test_compute_duration_raises_error(41-44)flixopt/modeling.py (2)
ModelingUtilities(118-193)compute_consecutive_hours_in_state(120-142)
tests/deprecated/test_scenarios.py (2)
flixopt/flow_system.py (2)
scenario_weights(2053-2060)scenario_weights(2063-2091)flixopt/structure.py (2)
scenario_weights(268-280)flow_system(437-455)
tests/test_storage.py (1)
flixopt/structure.py (2)
timestep_duration(220-222)timestep_duration(1749-1750)
tests/test_sel_isel_single_selection.py (1)
flixopt/flow_system.py (3)
FlowSystem(55-2499)add_elements(1168-1216)transform(1605-1623)
flixopt/optimize_accessor.py (1)
flixopt/flow_system.py (2)
FlowSystem(55-2499)build_model(1355-1391)
docs/notebooks/data/generate_example_systems.py (1)
docs/notebooks/data/generate_realistic_profiles.py (8)
ElectricityLoadGenerator(128-170)GasPriceGenerator(234-262)ThermalLoadGenerator(68-125)load_electricity_prices(53-62)load_weather(37-50)generate(87-125)generate(143-170)generate(237-262)
flixopt/results.py (4)
flixopt/structure.py (4)
timestep_duration(220-222)timestep_duration(1749-1750)solution(169-217)solution(1152-1166)flixopt/flow_system.py (3)
calculate_timestep_duration(330-335)solution(1449-1466)solution(1469-1472)flixopt/statistics_accessor.py (1)
flow_rates(365-387)flixopt/elements.py (1)
flow_rate(806-808)
tests/deprecated/test_effect.py (1)
flixopt/structure.py (2)
timestep_duration(220-222)timestep_duration(1749-1750)
flixopt/statistics_accessor.py (2)
flixopt/flow_system.py (4)
dims(1936-1956)is_clustered(2146-2160)weights(2104-2124)temporal_dims(1980-1987)flixopt/plot_result.py (2)
show(55-58)PlotResult(20-143)
flixopt/elements.py (2)
flixopt/flow_system.py (3)
sum_temporal(2126-2143)temporal_weight(1990-1998)temporal_dims(1980-1987)flixopt/structure.py (5)
sum_temporal(259-265)timestep_duration(220-222)timestep_duration(1749-1750)temporal_weight(255-257)temporal_dims(247-252)
flixopt/optimization.py (1)
flixopt/flow_system.py (2)
FlowSystem(55-2499)create_model(1333-1353)
flixopt/comparison.py (1)
flixopt/plot_result.py (2)
PlotResult(20-143)update_traces(75-88)
flixopt/dataset_plot_accessor.py (1)
flixopt/config.py (1)
Plotting(550-592)
flixopt/clustering/base.py (2)
flixopt/plot_result.py (1)
PlotResult(20-143)flixopt/config.py (1)
Plotting(550-592)
flixopt/components.py (1)
flixopt/clustering/intercluster_helpers.py (2)
build_boundary_coords(134-178)extract_capacity_bounds(64-131)
flixopt/structure.py (1)
flixopt/flow_system.py (11)
FlowSystem(55-2499)coords(2001-2019)indexes(1959-1977)dims(1936-1956)weights(2104-2124)temporal_dims(1980-1987)temporal_weight(1990-1998)sum_temporal(2126-2143)scenario_weights(2053-2060)scenario_weights(2063-2091)_unit_weight(2093-2101)
flixopt/flow_system.py (2)
flixopt/clustering/base.py (2)
Clustering(962-1117)_register_clustering_classes(1170-1179)flixopt/structure.py (7)
timestep_duration(220-222)timestep_duration(1749-1750)dims(229-231)flow_system(437-455)temporal_dims(247-252)weights(239-244)sum_temporal(259-265)
tests/test_cluster_reduce_expand.py (3)
flixopt/effects.py (1)
Effect(34-305)flixopt/transform_accessor.py (2)
cluster(578-1063)expand_solution(1357-1505)flixopt/clustering/base.py (5)
n_clusters(1048-1053)clusters(861-958)cluster_order(1034-1038)timesteps_per_cluster(1071-1075)get_cluster_order_for_slice(136-151)
🪛 markdownlint-cli2 (0.18.1)
CHANGELOG.md
74-74: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
121-121: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
149-149: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
189-189: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
docs/notebooks/data/raw/README.md
10-10: Bare URL used
(MD034, no-bare-urls)
26-26: Bare URL used
(MD034, no-bare-urls)
28-28: Bare URL used
(MD034, no-bare-urls)
docs/user-guide/optimization/clustering.md
53-53: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
54-54: Table column count
Expected: 3; Actual: 4; Too many cells, extra data will be missing
(MD056, table-column-count)
158-158: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.13)
- GitHub Check: test (3.14)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: Build documentation
docs/notebooks/08a-aggregation.ipynb
Outdated
| "## Create the FlowSystem\n", | ||
| "\n", | ||
| "We use real-world district heating data at 15-minute resolution (one month):" | ||
| "We use a district heating system with real-world time series data (one month at 15-min resolution):" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix day/week calculations to match 15‑min resolution
The data is described as “one month at 15‑min resolution”, but:
- The days print assumes hourly data (
/ 24and “at hourly resolution”). - The “first week” plot uses 168 timesteps, which is only 1.75 days at 15‑min resolution.
This will confuse readers and misreport problem size. I suggest explicitly using 96 timesteps per day and 7·24·4 timesteps per week:
Proposed notebook diff
- print(f'Loaded FlowSystem: {len(timesteps)} timesteps ({len(timesteps) / 24:.0f} days at hourly resolution)')
+ print(
+ f'Loaded FlowSystem: {len(timesteps)} timesteps '
+ f'({len(timesteps) / 96:.0f} days at 15-min resolution)'
+ )-# Visualize first week of data
-heat_demand = flow_system.components['HeatDemand'].inputs[0].fixed_relative_profile
-electricity_price = flow_system.components['GridBuy'].outputs[0].effects_per_flow_hour['costs']
-
-fig = make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.1)
-
-fig.add_trace(go.Scatter(x=timesteps[:168], y=heat_demand.values[:168], name='Heat Demand'), row=1, col=1)
-fig.add_trace(
- go.Scatter(x=timesteps[:168], y=electricity_price.values[:168], name='Electricity Price'), row=2, col=1
-)
+# Visualize first week of data (7 days × 24 h × 4 steps per hour)
+week_steps = 7 * 24 * 4
+heat_demand = flow_system.components['HeatDemand'].inputs[0].fixed_relative_profile
+electricity_price = flow_system.components['GridBuy'].outputs[0].effects_per_flow_hour['costs']
+
+fig = make_subplots(rows=2, cols=1, shared_xaxes=True, vertical_spacing=0.1)
+
+fig.add_trace(
+ go.Scatter(x=timesteps[:week_steps], y=heat_demand.values[:week_steps], name='Heat Demand'),
+ row=1,
+ col=1,
+)
+fig.add_trace(
+ go.Scatter(x=timesteps[:week_steps], y=electricity_price.values[:week_steps], name='Electricity Price'),
+ row=2,
+ col=1,
+)Also applies to: 62-69, 79-91
🤖 Prompt for AI Agents
In @docs/notebooks/08a-aggregation.ipynb around lines 50 - 52, The notebook
currently treats the data as hourly; update all day/week calculations to
15‑minute resolution by using 96 timesteps per day and 672 timesteps per week
(7*24*4). Specifically, replace any "/ 24" day-size divisions with "/ 96",
change any "hourly resolution" text to "15‑min resolution", and update any
slices like "data[:168]" or similar first-week indices to use 672 (or a computed
variable timesteps_per_week = 7*24*4); apply these fixes in the cells referenced
(around the days print and the “first week” plot and cells covering lines 62-69
and 79-91).
| 'default_line_shape': 'hv', | ||
| 'dim_priority': ('time', 'duration', 'duration_pct', 'variable', 'cluster', 'period', 'scenario'), | ||
| 'slot_priority': ('x', 'color', 'facet_col', 'facet_row', 'animation_frame'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify usage of new plotting config attributes in codebase
echo "=== Searching for default_line_shape usage ==="
rg -n -C3 'default_line_shape|line_shape' --type=py -g '!test_*' -g '!**/tests/**'
echo -e "\n=== Searching for dim_priority usage ==="
rg -n -C3 'dim_priority' --type=py -g '!test_*' -g '!**/tests/**'
echo -e "\n=== Searching for slot_priority usage ==="
rg -n -C3 'slot_priority' --type=py -g '!test_*' -g '!**/tests/**'
echo -e "\n=== Verify dimensions in dim_priority tuple exist ==="
echo "Checking for dimension names: time, duration, duration_pct, variable, cluster, period, scenario"
rg -n "dims?=.*['\"](?:time|duration|duration_pct|variable|cluster|period|scenario)['\"]" --type=py | head -20Repository: flixOpt/flixopt
Length of output: 23039
🏁 Script executed:
#!/bin/bash
# Search for actual dimension usage in datasets
echo "=== Searching for dimension 'duration' ==="
rg -n "duration" --type=py | grep -i "dim\|dimension" | head -10
echo -e "\n=== Searching for dimension 'duration_pct' ==="
rg -n "duration_pct" --type=py | head -10
echo -e "\n=== Searching for dimension 'cluster' ==="
rg -n "cluster" --type=py | head -10
echo -e "\n=== Searching for dimension 'period' ==="
rg -n "period" --type=py | head -10
echo -e "\n=== Checking slot assignment logic in dataset_plot_accessor.py ==="
sed -n '19,90p' flixopt/dataset_plot_accessor.py
echo -e "\n=== Verifying line_shape value 'hv' in Plotly context ==="
rg -n "line_shape.*hv" --type=pyRepository: flixOpt/flixopt
Length of output: 7030
Review dim_priority tuple—includes dimensions that don't appear to be used as actual dataset dimensions.
The defaults mostly work correctly:
default_line_shape: 'hv'is a valid Plotly line shape value and is properly consumedslot_priority: ('x', 'color', 'facet_col', 'facet_row', 'animation_frame')correctly matches the slot assignment logic
However, the dim_priority tuple includes 'duration' and 'cluster' which don't appear to be actual dimensions in datasets:
'duration'appears only as a parameter name (duration_dim,duration_per_step), never as a dataset dimension'cluster'is used in clustering operations but doesn't exist as a dimension name
The code handles this gracefully by filtering dim_priority to only available dimensions (line 60: priority_dims = [d for d in CONFIG.Plotting.dim_priority if d in available]), so this isn't a bug. However, including non-existent defaults is misleading. Consider removing 'duration' and 'cluster' from the default dim_priority, or verify they should be included for potential future use cases.
🤖 Prompt for AI Agents
In @flixopt/config.py around lines 166 - 168, The dim_priority default tuple in
CONFIG.Plotting includes 'duration' and 'cluster', which are not actual dataset
dimension names; remove those entries from the default dim_priority (update the
tuple assigned to CONFIG.Plotting.dim_priority) so it only lists real dimension
names, or if they are intended for future use, add a comment documenting why
they remain; also verify the filtering code that builds priority_dims (the list
comprehension for CONFIG.Plotting.dim_priority against available) remains intact
to guard against any other mismatches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/notebooks/08b-rolling-horizon.ipynb (1)
113-129: Fix the diagram and timestep counts to match the 1h resampled data.The diagram and comments describe the rolling horizon using 15-min resolution timestep counts, but the code resamples to 1h resolution in line 68. This creates major inconsistencies:
Line 118: Shows "1344 timesteps (14 days)" — this is correct for 15-min data, but after resampling to 1h you have ~336 timesteps (14 days × 24 hours).
Lines 120-123: Show segments as "192 (2 days)" but with
horizon=96(line 144) at 1h resolution, each segment is actually 4 days, not 2 days.The entire diagram needs to be updated to reflect 1h resolution timestep counts, or the code should be changed to not resample (remove
.transform.resample('1h')from line 68).🔎 Proposed fix to align diagram with 1h resolution
"```\n", -"Full horizon: |---------- 1344 timesteps (14 days) ----------|\n", +"Full horizon: |---------- 336 timesteps (14 days) ----------|\n", " \n", -"Segment 1: |==== 192 (2 days) ====|-- overlap --|\n", -"Segment 2: |==== 192 (2 days) ====|-- overlap --|\n", -"Segment 3: |==== 192 (2 days) ====|-- overlap --|\n", +"Segment 1: |==== 96 (4 days) ====|-- overlap --|\n", +"Segment 2: |==== 96 (4 days) ====|-- overlap --|\n", +"Segment 3: |==== 96 (4 days) ====|-- overlap --|\n", "... \n", "```\n",
🤖 Fix all issues with AI Agents
In @docs/notebooks/08b-rolling-horizon.ipynb:
- Line 144: Update the inline comment next to the horizon parameter used in the
notebook cell where "horizon=96" is set: change the text to reflect that the
data is at 1‑hour resolution and that horizon=96 corresponds to 4 days (96
timesteps at 1h resolution) — e.g., replace the current "2-day segments (192
timesteps at 15-min resolution)" comment with "4-day segments (96 timesteps at
1h resolution)" next to the horizon=96 setting.
- Around line 54-57: Update the narrative in the "Create the FlowSystem" cell to
state that although the source data is two weeks at 15‑min resolution, the
dataset is immediately resampled to 1‑hour resolution before building the
FlowSystem; locate the cell that performs the resampling (look for a DataFrame
.resample("1H") or .resample('H') call in the same cell or the cell where the
FlowSystem is constructed) and add a short clarifying sentence referencing that
resample step so readers know the notebook uses hourly data after resampling.
🧹 Nitpick comments (1)
docs/notebooks/08b-rolling-horizon.ipynb (1)
329-346: Consider aligning API reference examples with notebook values for consistency.The API reference shows example values (
horizon=192,overlap=48) at 15-min resolution, while the notebook useshorizon=96,overlap=24at 1h resolution. While API references typically show generic examples, aligning them with the actual notebook values would reduce potential confusion, especially given the resolution-related issues elsewhere in the notebook.🔎 Proposed alignment with notebook values
"```python\n", "segments = flow_system.optimize.rolling_horizon(\n", " solver, # Solver instance\n", -" horizon=192, # Timesteps per segment (e.g., 2 days at 15-min resolution)\n", -" overlap=48, # Additional lookahead timesteps (e.g., 12 hours)\n", +" horizon=96, # Timesteps per segment (e.g., 4 days at 1h resolution)\n", +" overlap=24, # Additional lookahead timesteps (e.g., 1 day)\n", " nr_of_previous_values=1, # Flow history for uptime/downtime tracking\n", ")\n",
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/notebooks/08b-rolling-horizon.ipynb
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.13)
- GitHub Check: test (3.14)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
- GitHub Check: Build documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🤖 Fix all issues with AI Agents
In @.github/workflows/docs.yaml:
- Around line 124-130: The notebook-cache-key step assigns HASH using unquoted
command substitutions which allows word splitting and glob expansion; update the
step (id: notebook-cache-key) to use quoted command substitutions and safe
handling of filenames (e.g., build lists with null-separated output and xargs -0
or quote "$( ... )") and ensure any variable expansions like $HASH are quoted;
change the HASH assignment and the tar/find pipeline to use a safe approach
(null-separated find/xargs or properly quoted subshells) to prevent
word-splitting/globbing vulnerabilities.
- Around line 132-137: The cache step named "Cache executed notebooks" (step id
notebook-cache) in the deploy job uses the path pattern docs/notebooks/*.ipynb
which misses notebooks in subdirectories; change the path to a recursive glob
(e.g., docs/notebooks/**/*.ipynb) to match files in nested folders, mirroring
the same fix applied to the build job so the cache correctly includes all
notebook files.
- Around line 68-73: The cache step "Cache executed notebooks" (id:
notebook-cache) uses a non-recursive path pattern docs/notebooks/*.ipynb, so
notebooks in subdirectories aren’t cached; update the with.path value to a
recursive glob such as docs/notebooks/**/*.ipynb (or '**/*.ipynb' if you want to
cache all repo notebooks) while keeping the same step name/id and key
(notebooks-${{ steps.notebook-cache-key.outputs.hash }}).
- Around line 60-66: The step "Get notebook cache key" (id notebook-cache-key)
uses unquoted command substitution that breaks on filenames with spaces; replace
the $(find ...) usage with a null-delimited pipeline and have tar read
NUL-separated paths (e.g., combine the two find commands with -print0 and pipe
into tar using --null -T -), then pipe to sha256sum and set HASH as before so
the HASH variable assignment remains unchanged.
In @docs/notebooks/data/generate_example_systems.py:
- Around line 56-59: Module-level calls to load_weather() and
load_electricity_prices() (_weather and _elec_prices) can raise exceptions
during import and cause obscure failures; wrap these loads in explicit error
handling (try/except) around the calls to load_weather and
load_electricity_prices, catch exceptions, and either log a clear descriptive
error (including exception details) and re-raise a more informative exception or
fallback to a safe default, and ensure _elec_prices.index.tz_localize(None) is
only run when _elec_prices was successfully loaded; alternatively move the loads
into an initializer function (e.g., init_shared_data) that callers invoke and
which handles errors consistently.
In @flixopt/clustering/base.py:
- Around line 88-99: The __repr__ is converting cluster_occurrences with
int(self.cluster_occurrences.sel(cluster=c).values) which breaks when
cluster_occurrences has extra dimensions; change the extraction to yield a
scalar before converting to int (e.g., use .item() or .squeeze() on the result
of self.cluster_occurrences.sel(cluster=c) or select a specific period/scenario)
so that int(...) receives a Python scalar; update the occ comprehension in
ClusterStructure.__repr__ to call .item() (or otherwise reduce dimensions) on
the selection to avoid array-to-int conversion errors.
In @flixopt/comparison.py:
- Line 380: The current assignment `title = result.figure.layout.title.text or
title` can raise AttributeError if `result.figure.layout.title` is None; update
it to safely access `.text` by checking `result.figure.layout.title` (or using
`getattr`) before reading `.text` and fall back to the existing `title` when
either the title object or its `.text` is missing/None; modify the code around
`result.figure.layout.title` to use a conditional or
`getattr(result.figure.layout.title, "text", None)` so it never dereferences a
None.
- Around line 183-187: When `reference` is an int, validate it against the
available cases before calling self.solution.isel: check that 0 <= reference <
len(self._names) (or the equivalent length of cases) and if not raise a clear
IndexError (or ValueError) with a descriptive message; only then set ref_idx =
reference and call self.solution.isel(case=ref_idx). This ensures negative or
out-of-range integers are caught and reported instead of letting
self.solution.isel raise an unhelpful IndexError.
🧹 Nitpick comments (8)
.github/workflows/docs.yaml (1)
60-144: Consider extracting duplicated logic to a composite action.The notebook caching and execution steps (lines 60-82 and 124-144) are duplicated between the build and deploy jobs. While duplication in GitHub Actions workflows is common for job isolation, consider extracting this to a composite action or reusable workflow to improve maintainability.
This refactor can be deferred, as the duplication ensures each job remains self-contained and doesn't rely on shared state.
flixopt/comparison.py (2)
18-18:__all__is incomplete.The module exports only
Comparison, butComparisonStatisticsandComparisonStatisticsPlotare also public classes accessed viaComparison.statisticsandComparison.statistics.plot. While users typically access them through theComparisonobject, they should be documented in__all__for completeness and IDE discoverability.🔎 Proposed fix
-__all__ = ['Comparison'] +__all__ = ['Comparison', 'ComparisonStatistics', 'ComparisonStatisticsPlot']
377-382: Silent exception swallowing may hide legitimate errors.The broad
except (KeyError, ValueError): continuesilently skips FlowSystems when errors occur. This could mask legitimate issues (e.g., mismatched dimensions, missing data) that users should be aware of.Consider logging a warning when a system is skipped:
🔎 Proposed fix
try: result = getattr(fs.statistics.plot, method_name)(*args, **kwargs) datasets.append(result.data.expand_dims(case=[case_name])) title = result.figure.layout.title.text or title except (KeyError, ValueError): + import logging + logging.getLogger('flixopt').debug( + f"Skipping case '{case_name}' for plot method '{method_name}': data not available" + ) continueflixopt/clustering/base.py (2)
512-513: Consider movingwarningsimport to module level.The
warningsmodule is imported inside conditional blocks. While this works, moving it to module level is more conventional and slightly more efficient.Also applies to: 523-524
1136-1147: Consider usingnp.bincountorCounterfor efficiency.The occurrence counting loop can be simplified:
🔎 Proposed fix
# Count occurrences of each cluster - unique_clusters = np.unique(cluster_order) - occurrences = {} - for c in unique_clusters: - occurrences[int(c)] = sum(1 for x in cluster_order if x == c) - - n_clusters = len(unique_clusters) + n_clusters = int(np.max(cluster_order)) + 1 + occurrences = np.bincount(cluster_order, minlength=n_clusters) cluster_occurrences_da = xr.DataArray( - [occurrences.get(c, 0) for c in range(n_clusters)], + occurrences, dims=['cluster'], name='cluster_occurrences', )docs/notebooks/08d-clustering-multiperiod.ipynb (1)
295-298: Consider usingstrict=Truefor consistency.The
uniqueandcountsarrays fromnp.unique(..., return_counts=True)are guaranteed to have the same length, sostrict=Truewould be more appropriate and consistent with otherzip()usages in this PR.🔎 Proposed fix
- "for cluster_id, count in zip(unique, counts, strict=False):\n", + "for cluster_id, count in zip(unique, counts, strict=True):\n",flixopt/flow_system.py (2)
804-822: Consider using a constant for the clustering prefix.The clustering array prefix
'clustering|'and its length11are used in multiple places (lines 684, 809, 813). While the comment on line 811-812 explains the magic number, using a module-level constant would improve maintainability and prevent errors if the prefix ever changes.🔎 Proposed refactor
At the module level (after imports):
_CLUSTERING_PREFIX = 'clustering|'Then replace usages:
if self.clustering is not None: clustering_ref, clustering_arrays = self.clustering._create_reference_structure() # Add clustering arrays with prefix for name, arr in clustering_arrays.items(): - ds[f'clustering|{name}'] = arr + ds[f'{_CLUSTERING_PREFIX}{name}'] = arr ds.attrs['clustering'] = json.dumps(clustering_ref)And in from_dataset:
if 'clustering' in reference_structure: clustering_structure = json.loads(reference_structure['clustering']) # Collect clustering arrays (prefixed with 'clustering|') clustering_arrays = {} for name, arr in ds.data_vars.items(): - if name.startswith('clustering|'): - # Remove 'clustering|' prefix (11 chars) from both key and DataArray name + if name.startswith(_CLUSTERING_PREFIX): + # Remove prefix from both key and DataArray name # This ensures that if the FlowSystem is serialized again, the arrays # won't get double-prefixed (clustering|clustering|...) - arr_name = name[11:] + arr_name = name[len(_CLUSTERING_PREFIX):] clustering_arrays[arr_name] = arr.rename(arr_name)
1936-2040: LGTM: New dimension properties support clustering feature well.The new
dims,indexes,temporal_dims, andtemporal_weightproperties provide a clean API for working with clustered FlowSystems. The deprecation ofcoords→indexesincludes a clear migration path.Minor: Line 1997's fallback logic in
temporal_weightappears defensive sinceweights['cluster']should always exist when needed (lines 2127-2128). Consider simplifying:@property def temporal_weight(self) -> xr.DataArray: """Combined temporal weight (timestep_duration × cluster_weight).""" return self.weights['time'] * self.weights.get('cluster', 1.0)This leverages the
weightsproperty's existing fallback to_unit_weight('cluster').
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
.github/workflows/docs.yamldocs/notebooks/08a-aggregation.ipynbdocs/notebooks/08b-rolling-horizon.ipynbdocs/notebooks/08d-clustering-multiperiod.ipynbdocs/notebooks/data/generate_example_systems.pydocs/user-guide/optimization/clustering.mddocs/user-guide/results/index.mdflixopt/clustering/base.pyflixopt/clustering/intercluster_helpers.pyflixopt/comparison.pyflixopt/flow_system.pytests/test_clustering/test_base.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/test_clustering/test_base.py
- flixopt/clustering/intercluster_helpers.py
🧰 Additional context used
🧬 Code graph analysis (2)
docs/notebooks/data/generate_example_systems.py (1)
docs/notebooks/data/generate_realistic_profiles.py (8)
ElectricityLoadGenerator(128-170)GasPriceGenerator(234-262)ThermalLoadGenerator(68-125)load_electricity_prices(53-62)load_weather(37-50)generate(87-125)generate(143-170)generate(237-262)
flixopt/comparison.py (2)
flixopt/config.py (2)
CONFIG(184-912)Plotting(550-592)flixopt/plot_result.py (2)
PlotResult(20-143)update_traces(75-88)
🪛 LanguageTool
docs/user-guide/results/index.md
[style] ~333-~333: Using many exclamation marks might seem excessive (in this case: 9 exclamation marks for a text that’s 4730 characters long)
Context: ...Works! cluster_boundary is ignored ``` !!! note "Component Differences" System...
(EN_EXCESSIVE_EXCLAMATION)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test (3.12)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
- GitHub Check: test (3.14)
🔇 Additional comments (19)
.github/workflows/docs.yaml (3)
75-82: Well-implemented parallel notebook execution.The notebook execution logic correctly:
- Uses null-delimited output (
-print0/-0) for safe handling of filenames with special characters- Changes to the notebooks directory so relative imports work correctly
- Runs notebooks in parallel with
-P 4for improved CI performance- Uses
exit 255to ensure failures propagate properly throughxargs
139-144: Well-implemented parallel notebook execution (deploy job).This duplicate implementation also correctly handles notebook execution with proper safety measures.
84-87: Correct approach to disable notebook execution during mkdocs build.Setting
MKDOCS_JUPYTER_EXECUTE="false"is the right approach since notebooks are pre-executed in the previous steps. This prevents redundant execution and improves build performance.Also applies to: 151-154
flixopt/comparison.py (1)
26-71: LGTM!The
Comparisonclass design is well-structured with comprehensive validation, clear docstrings with examples, and appropriate error messages guiding users to align dimensions.flixopt/clustering/base.py (1)
364-468: LGTM!The
expand_datamethod correctly handles both simple and multi-dimensional cases with proper comments explaining the use ofcluster_structure.timesteps_per_clusterinstead ofaggregated.sizes['time'].docs/user-guide/results/index.md (2)
280-339: LGTM!The new "Comparing Multiple Systems" section provides comprehensive documentation with clear examples, requirements, and practical use cases. The explanation of core dimension matching and the use of
.transform.sel()for alignment is particularly helpful.
362-365: No changes needed. Themode='area'parameter is supported through**kwargsand is correctly used in the example. Testing confirms that mode parameters including 'area' work as expected with the balance() method.docs/user-guide/optimization/clustering.md (2)
49-59: LGTM! Past review concerns addressed.The pipe characters in the table are now properly escaped (
\|) and the code block at line 158 now has a language specifier (text).
140-148: LGTM!The storage modes table clearly explains the different clustering modes and their use cases, which is essential for users working with seasonal storage optimization.
docs/notebooks/08d-clustering-multiperiod.ipynb (1)
1-40: LGTM!The notebook provides a comprehensive tutorial on multi-period clustering with clear learning objectives, practical examples, and well-documented code cells.
docs/notebooks/data/generate_example_systems.py (2)
73-76: LGTM! Past review concern addressed.The code now correctly uses
.reindex(timesteps, method='ffill')for robust timestamp alignment, addressing the previous review concerns about potentialKeyErrorwith.loc[].
498-622: LGTM!The
create_seasonal_storage_system()function is well-documented with clear comments explaining the seasonal storage use case, solar thermal integration, and realistic data sources.docs/notebooks/08a-aggregation.ipynb (3)
62-69: LGTM! Resolution is now consistent.The notebook now uses
create_district_heating_system()which generates hourly data (line 280 of generate_example_systems.py:freq='h'), so the "hourly resolution" description and 168-timestep week slicing are correct.
288-291: LGTM!Good use of the new
fx.Comparisonclass to demonstrate side-by-side visualization of different optimization approaches.
113-114: Comment mentions 15-min resolution but data is hourly.The comment says "Resample from 15-min to 4h resolution" but
create_district_heating_system()generates hourly data. Consider updating to match actual resolution.🔎 Proposed fix
- "# Resample from 15-min to 4h resolution\n", + "# Resample from hourly to 4h resolution\n",Likely an incorrect or invalid review comment.
flixopt/flow_system.py (4)
527-534: LGTM: Clean dimension reduction logic.The cleanup when a dimension is dropped (removing scalar coordinates, data variables, and attributes) ensures the dataset stays consistent after selection operations. The parallel logic for scenario dimension (lines 577-584) follows the same pattern.
1333-1392: LGTM: Well-handled deprecation of normalize_weights parameter.The deprecation path is clean:
- Parameter still accepted to avoid breaking existing code
- Clear warnings with removal version specified
- Consistent with the new behavior where scenario weights are always normalized in the setter (lines 2087-2102)
- Documentation updated to reflect the change
2062-2103: LGTM: Robust normalization with excellent error messages.The scenario_weights setter now always normalizes to sum to 1, with comprehensive error handling:
- Clear validation that scenarios must be defined
- Detailed error messages for zero-sum cases, including coordinate information for multi-dimensional weights
- Consistent with the deprecation of the
normalize_weightsparameterThe detailed error message at lines 2093-2100 is particularly helpful for debugging multi-dimensional weight issues.
2114-2154: LGTM: Clean weight management and temporal summation.The
weightsproperty provides a unified interface for accessing dimension weights, with sensible defaults (unit weights of 1.0) when explicit weights aren't set. Thesum_temporalmethod correctly applies full temporal weighting (timestep_duration × cluster_weight) before summing, which is essential for converting rates to totals in clustered systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI Agents
In @flixopt/comparison.py:
- Around line 183-187: When reference is an int, validate it against self._names
before using it: check that isinstance(reference, int) then ensure 0 <=
reference < len(self._names) (or normalize/explicitly disallow negative indices)
and if out of range raise a clear IndexError (e.g. "reference index X out of
range [0, N)"), then assign ref_idx and call self.solution.isel(case=ref_idx) as
before so the error is explicit and informative.
- Around line 350-367: The code in _combine_data reads
result.figure.layout.title.text directly which can raise AttributeError if title
is None; update the title assignment to safely handle a missing title (e.g.,
check whether result.figure.layout.title is truthy or use getattr) and only set
title to its .text when present, falling back to the current title otherwise.
🧹 Nitpick comments (1)
flixopt/comparison.py (1)
361-362: Silent exception swallowing may hide real errors.Catching
KeyErrorandValueErrorand silently continuing could mask legitimate issues. Consider logging a debug message when a system is skipped so users can diagnose unexpected empty results.🔎 Proposed improvement
except (KeyError, ValueError): + logger.debug(f"Skipping case '{case_name}' for method '{method_name}': data not available") continue
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
flixopt/comparison.py
🧰 Additional context used
🧬 Code graph analysis (1)
flixopt/comparison.py (1)
flixopt/plot_result.py (2)
PlotResult(20-143)update_traces(75-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test (3.13)
- GitHub Check: test (3.14)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
🔇 Additional comments (9)
flixopt/comparison.py (9)
1-24: LGTM!Module setup is clean with proper type-checking guards and well-organized imports. The
__all__export of onlyComparisonis appropriate sinceComparisonStatisticsandComparisonStatisticsPlotare accessed via properties.
73-98: LGTM!Constructor validation is comprehensive: checks minimum systems count, names-to-systems length match, name uniqueness, solution existence, and dimension compatibility. Lazy caching pattern for solution and statistics is appropriate.
104-145: LGTM!Dimension validation logic is sound. The separation of core dimensions (
time,period,scenario) from auxiliary dimensions likecluster_boundaryis a sensible design. Warnings for coordinate misalignment help users understand potential NaN introduction during xarray alignment.
190-337: LGTM!
ComparisonStatisticscleanly mirrorsStatisticsAccessorproperties with consistent lazy caching. The_concat_propertyand_merge_dict_propertyhelpers provide clean abstractions for combining data across cases.
369-377: LGTM!The
_finalizehelper correctly handles the show logic with config fallback and gracefully returns an emptygo.Figure()whenfigisNone.
379-406: LGTM!The
balancemethod correctly delegates data combination and plotting, with proper handling of empty datasets.
439-466: LGTM!The
flowsmethod correctly passes all relevant parameters to_combine_dataand usesfxplot.linefor visualization.
582-617: LGTM!The
effectsmethod handles the dynamicx_colandcolor_coldetermination correctly based on thebyparameter and number of data variables.
619-642: LGTM!The
heatmapmethod correctly extracts the first data variable for plotting withda.fxplot.heatmap.
| else: | ||
| ref_idx = reference | ||
|
|
||
| ref_data = self.solution.isel(case=ref_idx) | ||
| return self.solution - ref_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing bounds check for integer reference index.
When reference is an integer, there's no validation that it's within bounds of self._names. A negative index or out-of-range index would cause an unclear IndexError at line 186.
🔎 Proposed fix
else:
- ref_idx = reference
+ ref_idx = reference
+ if ref_idx < 0 or ref_idx >= len(self._names):
+ raise ValueError(f"Reference index {reference} out of range. Valid: 0-{len(self._names)-1}")
ref_data = self.solution.isel(case=ref_idx)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| else: | |
| ref_idx = reference | |
| ref_data = self.solution.isel(case=ref_idx) | |
| return self.solution - ref_data | |
| else: | |
| ref_idx = reference | |
| if ref_idx < 0 or ref_idx >= len(self._names): | |
| raise ValueError(f"Reference index {reference} out of range. Valid: 0-{len(self._names)-1}") | |
| ref_data = self.solution.isel(case=ref_idx) | |
| return self.solution - ref_data |
🤖 Prompt for AI Agents
In @flixopt/comparison.py around lines 183 - 187, When reference is an int,
validate it against self._names before using it: check that
isinstance(reference, int) then ensure 0 <= reference < len(self._names) (or
normalize/explicitly disallow negative indices) and if out of range raise a
clear IndexError (e.g. "reference index X out of range [0, N)"), then assign
ref_idx and call self.solution.isel(case=ref_idx) as before so the error is
explicit and informative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/notebooks/08b-rolling-horizon.ipynb (1)
114-120: Fix the arithmetic in the rolling horizon explanation.The comment block contains inconsistent calculations:
- Line 114: "1344 timesteps (14 days)" — At 1h resolution, 1344 timesteps = 56 days, not 14 days
- Lines 116-118: "192 (2 days)" for segments — At 1h resolution, 2 days = 48 timesteps, not 192. (192 timesteps = 8 days)
Given that line 68 resamples to 1h and the data is described as "two weeks" (14 days), the expected timestep count should be 336 (14 days × 24 hours).
🔎 Proposed fix
Assuming the original data truly is two weeks and is resampled to 1h:
- "Full horizon: |---------- 1344 timesteps (14 days) ----------|\n", + "Full horizon: |---------- 336 timesteps (14 days) ----------|\n", " \n", - "Segment 1: |==== 192 (2 days) ====|-- overlap --|\n", - "Segment 2: |==== 192 (2 days) ====|-- overlap --|\n", - "Segment 3: |==== 192 (2 days) ====|-- overlap --|\n", + "Segment 1: |==== 96 (4 days) ====|-- overlap --|\n", + "Segment 2: |==== 96 (4 days) ====|-- overlap --|\n", + "Segment 3: |==== 96 (4 days) ====|-- overlap --|\n",Note: This assumes
horizon=96(line 140) represents 4 days at 1h resolution.
🧹 Nitpick comments (2)
flixopt/dataset_plot_accessor.py (1)
756-756: Consider movingnumpyimport to module level.
numpyis imported inside theto_duration_curvemethod but is commonly used throughout the codebase. Moving it to the module-level imports would be more consistent with Python conventions.🔎 Proposed fix
Add at top of file with other imports:
import numpy as npThen remove the local import at line 756.
flixopt/statistics_accessor.py (1)
2103-2131: Complex secondary axis logic - verify with edge cases.The secondary y-axis creation logic (lines 2107-2131) uses
+100offset for axis numbering to avoid collisions. While this approach works, it's somewhat fragile if Plotly ever uses high axis numbers. The logic correctly matches primary to secondary axes.Consider adding a brief inline comment explaining the
+100offset strategy for maintainability.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/notebooks/08b-rolling-horizon.ipynbflixopt/comparison.pyflixopt/dataset_plot_accessor.pyflixopt/statistics_accessor.py
🚧 Files skipped from review as they are similar to previous changes (1)
- flixopt/comparison.py
🧰 Additional context used
🧬 Code graph analysis (2)
flixopt/dataset_plot_accessor.py (2)
flixopt/color_processing.py (1)
process_colors(165-233)flixopt/config.py (2)
CONFIG(184-912)Plotting(550-592)
flixopt/statistics_accessor.py (4)
flixopt/results.py (6)
sizes(682-734)colors(2220-2221)colors(2224-2228)variables(335-339)variables(1260-1268)size(2022-2030)flixopt/dataset_plot_accessor.py (8)
stacked_bar(269-336)stacked_bar(842-870)heatmap(473-569)heatmap(936-1017)line(338-405)line(872-902)bar(204-267)bar(812-840)flixopt/plot_result.py (2)
show(55-58)PlotResult(20-143)flixopt/clustering/base.py (1)
heatmap(703-831)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
- GitHub Check: test (3.14)
- GitHub Check: test (3.12)
- GitHub Check: Build documentation
🔇 Additional comments (19)
docs/notebooks/08b-rolling-horizon.ipynb (2)
66-73: LGTM!The FlowSystem loading, resampling to 1h, and timestep calculation are correct. The division by 24 is appropriate for hourly resolution data.
93-93: LGTM!The solver configuration and
fx.ComparisonAPI usage are appropriate and align with the PR's new comparison module feature.Also applies to: 196-197
flixopt/dataset_plot_accessor.py (6)
17-125: LGTM!The
assign_slotsfunction has clean logic with a well-structured two-pass approach: first resolving explicit dimensions, then auto-assigning remaining slots. The priority-based configuration delegation toCONFIG.Plottingis appropriate and the warnings provide helpful guidance when dimensions can't be assigned.
128-153: LGTM!The helper function correctly builds Plotly Express kwargs while respecting user overrides and smartly handling
facet_col_wrapbased on dimension size.
156-166: LGTM!The conversion function correctly handles edge cases (empty datasets, scalar variables) and properly uses
ds.dimsfor identifying index columns during melting.
269-336: Minor inconsistency:barmodeinbarvsstacked_bar.In
bar,barmode='group'is set infig_kwargs(line 262), but instacked_bar, it's applied viafig.update_layout(barmode='relative', ...)(line 334). While both work, this is a stylistic inconsistency. The current approach is fine sinceupdate_layoutensures the barmode takes precedence even if the user passes a conflicting value inpx_kwargs.
473-569: LGTM!The heatmap implementation correctly handles variable selection, dimension resolution, and the squeeze logic to prepare data for
px.imshow. Thebinary_string=Falsedefault is a good choice for handling non-numeric coordinates.
784-1017: LGTM!The DataArrayPlotAccessor correctly delegates to the Dataset implementation for bar/stacked_bar/line/area methods while having its own heatmap implementation suited for DataArray. The
exclude_dimsparameter is now properly forwarded in all delegating methods (addressing the past review comments).flixopt/statistics_accessor.py (11)
129-170: LGTM!The
_prepare_for_heatmapfunction has clear logic with well-defined helper functions. The handling of clustered data (using natural(time, cluster)shape) vs non-clustered data (with optional reshape) is appropriate. The boolean arithmetic for counting available slots is a clever and valid approach.
398-411: LGTM!The use of
self._fs.timestep_durationaligns with the PR's renaming of timestep primitives. The flow_hours calculation logic remains correct.
1372-1379: LGTM!The migration to
ds.fxplot.stacked_bar()is clean. The facet and animation parameters are correctly forwarded, and color handling is preserved through thecolorsparameter.
1552-1567: LGTM!The heatmap method refactoring is clean. The use of
_prepare_for_heatmapcentralizes the reshape/transpose logic, and delegation toda.fxplot.heatmap()simplifies the implementation. The return data structure is consistent with other methods.
1922-1934: LGTM!The color override logic (
plotly_kwargs.pop('color', color_col)) correctly allows users to override the default color grouping while maintaining backward compatibility. The migration tods.fxplot.bar()is clean.
1696-1709: LGTM!The sizes method correctly uses
ds.fxplot.bar()withx='variable'andcolor='variable'to create a bar chart where each investment size is a distinct bar. The facet/animation parameters are properly forwarded.
1808-1815: LGTM!The duration curve method correctly migrates to
result_ds.fxplot.line()while preserving the existing sorting and coordinate assignment logic.
1981-1988: LGTM!The charge_states method is cleanly migrated to
ds.fxplot.line().
1645-1652: LGTM!The flows method correctly migrates to
ds.fxplot.line()while preserving the existing flow filtering logic.
1493-1500: LGTM!The carrier_balance method is correctly migrated to
ds.fxplot.stacked_bar()with consistent parameter forwarding.
712-717: The code is safe. TheFlowSystem.weightsproperty (line 2115 in flow_system.py) is guaranteed to always return adict[str, xr.DataArray], never None. It initializes with{'time': ...}and conditionally adds other dimension keys. The.get('cluster', 1.0)call cannot raiseAttributeError.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI Agents
In @docs/notebooks/data/generate_example_systems.py:
- Around line 66-70: The FileNotFoundError handler after calling load_weather()
in generate_example_systems.py references the wrong filename; update the error
message to mention the actual file load_weather() expects (tmy_dresden.csv) and
keep DATA_DIR and the original error included (e.g., "Ensure tmy_dresden.csv
exists in {DATA_DIR}. Original error: {e}") so the exception message matches the
real source and aids debugging.
In @flixopt/clustering/base.py:
- Around line 1142-1150: The code builds cluster_occurrences_da by assuming
cluster IDs are contiguous 0..n_clusters-1 which breaks when cluster_order
contains gaps; change the logic to iterate over the actual unique cluster IDs
rather than range(n_clusters): compute unique_clusters =
np.unique(cluster_order) (or ensure it is sorted), build occurrences for each
unique ID (using occurrences.get or Counter), then construct
cluster_occurrences_da from [occurrences[c] for c in unique_clusters] (and set
the cluster coordinate labels to unique_clusters) so cluster_occurrences_da
aligns with the true cluster IDs.
- Around line 895-902: The extra-dimension validation currently only inspects
resolved_variables[0], but aggregated_data and the reshaping loop in clusters()
iterate all variables; update the check to iterate over all variable keys (e.g.,
for var in resolved_variables or for name in aggregated_data) and collect any
variables whose .dims contain dimensions other than 'time' (track names and
their extra_dims), then raise a ValueError listing each offending variable and
its extra dimensions with guidance (e.g., "Use select={'<dim>': <value>} for
variable '<name>'") so reshaping won't fail later.
🧹 Nitpick comments (5)
flixopt/comparison.py (3)
19-19: Consider exportingComparisonStatisticsandComparisonStatisticsPlotin__all__.Based on the AI summary, both
ComparisonStatisticsandComparisonStatisticsPlotare part of the public API surface. While they're primarily accessed viaComparison.statisticsandComparison.statistics.plot, users may need to type-hint them. Consider adding them to__all__for completeness.🔎 Proposed enhancement
-__all__ = ['Comparison'] +__all__ = ['Comparison', 'ComparisonStatistics', 'ComparisonStatisticsPlot']
79-79: Consider validating that names are non-empty.The current validation ensures names are unique and match the count of FlowSystems, but it doesn't prevent empty strings. While
fs.name or f'System {i}'provides a fallback whenfs.nameis falsy, if a user explicitly passesnames=['', 'other'], the empty string would be accepted, potentially causing confusion in plots and data access.🔎 Proposed enhancement
if len(set(self._names)) != len(self._names): raise ValueError(f'Case names must be unique, got: {self._names}') + if any(not name.strip() for name in self._names): + raise ValueError('Case names must be non-empty')
382-382: Consider moving the plotly import to module level.The
plotly.graph_objectsimport is currently inside_finalize, which is called by every plotting method. Moving it to the module level would be more idiomatic and avoid repeated import overhead, though the performance impact is likely negligible due to Python's import caching.🔎 Proposed refactor
At module level (after line 9):
import xarray as xr +import plotly.graph_objects as go from .config import CONFIGThen in
_finalize:def _finalize(self, ds: xr.Dataset, fig, show: bool | None) -> PlotResult: """Handle show and return PlotResult.""" - import plotly.graph_objects as go - if show is None:docs/notebooks/data/generate_example_systems.py (1)
541-545: Inconsistent data access pattern; consider using.reindex()for robustness.Lines 544-545 use direct
.valuesaccess while all other generator functions use.reindex(timesteps, method='ffill').values. Although this works because both have 8760 elements, it relies on implicit positional alignment rather than explicit index matching, which is less robust if the weather data format ever changes.Additionally, the comment on line 543 is slightly confusing—it says "Map to 2020 weather data" but the
timestepsare created for 2019.🔎 Suggested alignment with other functions
# Full year, hourly (use non-leap year to match TMY data which has 8760 hours) timesteps = pd.date_range('2019-01-01', periods=8760, freq='h') - # Map to 2020 weather data (TMY has 8760 hours, no Feb 29) - temp = _get_weather()['temperature_C'].values - ghi = _get_weather()['ghi_W_m2'].values + # TMY data has 8760 hours matching our non-leap year timesteps + weather = _get_weather() + # Reset weather index to match our 2019 timesteps for consistent alignment + weather_aligned = weather.set_index(timesteps) + temp = weather_aligned['temperature_C'].values + ghi = weather_aligned['ghi_W_m2'].valuesOr keep it simple with a clarifying comment:
# Full year, hourly (use non-leap year to match TMY data which has 8760 hours) timesteps = pd.date_range('2019-01-01', periods=8760, freq='h') - # Map to 2020 weather data (TMY has 8760 hours, no Feb 29) + # TMY data has exactly 8760 hours - use positional alignment temp = _get_weather()['temperature_C'].values ghi = _get_weather()['ghi_W_m2'].valuesflixopt/clustering/base.py (1)
917-918: Consider validating array size before reshape for clearer error messages.The reshape assumes
data.size == n_clusters * timesteps_per_cluster. If this doesn't hold (due to backend inconsistency or data corruption), numpy will raise a generic reshape error. An explicit check would provide a clearer, domain-specific error message.🔎 Optional: Add explicit size validation
data_vars = {} for var in resolved_variables: data = aggregated_data[var].values + expected_size = n_clusters * timesteps_per_cluster + if data.size != expected_size: + raise ValueError( + f'Variable {var!r} has {data.size} timesteps, expected {expected_size} ' + f'(= {n_clusters} clusters × {timesteps_per_cluster} timesteps/cluster)' + ) data_by_cluster = data.reshape(n_clusters, timesteps_per_cluster)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/docs.yamldocs/notebooks/08d-clustering-multiperiod.ipynbdocs/notebooks/data/generate_example_systems.pyflixopt/clustering/base.pyflixopt/comparison.py
🧰 Additional context used
🧬 Code graph analysis (2)
docs/notebooks/data/generate_example_systems.py (1)
docs/notebooks/data/generate_realistic_profiles.py (8)
ElectricityLoadGenerator(128-170)GasPriceGenerator(234-262)ThermalLoadGenerator(68-125)load_electricity_prices(53-62)load_weather(37-50)generate(87-125)generate(143-170)generate(237-262)
flixopt/comparison.py (1)
flixopt/plot_result.py (2)
PlotResult(20-143)update_traces(75-88)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
- GitHub Check: test (3.13)
- GitHub Check: test (3.14)
- GitHub Check: Build documentation
🔇 Additional comments (18)
docs/notebooks/08d-clustering-multiperiod.ipynb (3)
30-38: LGTM! Clean imports and setup.The imports are well-organized and appropriate for the tutorial. The
fx.CONFIG.notebook()call properly configures the environment for Jupyter notebook usage.
295-298: Good defensive programming withstrict=True.The change from
strict=Falsetostrict=Truein thezip()call is a solid improvement. Sincenp.unique(..., return_counts=True)guarantees thatuniqueandcountshave the same length, usingstrict=Trueadds a safety check without risk. This catches any unexpected edge cases and aligns with Python 3.10+ best practices.
500-582: Excellent educational structure and documentation.The "Key Considerations" and "Summary" sections provide valuable context for users. The comparison table showing computational benefits at different scales (lines 520-525) and the API reference (lines 555-581) are particularly helpful for understanding when and how to apply multi-period clustering.
flixopt/comparison.py (4)
186-188: LGTM - Bounds check correctly handles negative indices.The bounds validation properly handles both positive and negative indices using the standard Python range check
(-n_cases <= ref_idx < n_cases), which allows negative indexing while preventing out-of-range errors. This addresses the past review comment.
194-340: LGTM - Clean lazy-loading pattern with consistent implementation.The
ComparisonStatisticsclass follows a well-structured lazy-loading pattern with dedicated helpers (_concat_property,_merge_dict_property) that ensure consistency across all properties. The use ofstrict=Truein zips andjoin='outer'for alignment is appropriate for comparison scenarios.
365-367: LGTM - AttributeError risk properly mitigated.The code now safely accesses
result.figure.layout.titleby checking forNonebefore attempting to access.textviagetattr. This addresses the past review comment and preventsAttributeErrorwhen the title isNone.
390-651: LGTM - Plotting methods consistently delegate to fxplot infrastructure.All plotting methods follow a consistent pattern:
- Call
_combine_datato aggregate per-system data- Check for empty datasets and return early if needed
- Delegate to
fxplotfor visualization with appropriate parameters- Call
_finalizeto handle display and returnPlotResultThis design properly separates concerns and leverages the existing plotting infrastructure.
docs/notebooks/data/generate_example_systems.py (6)
88-140: LGTM!The function correctly uses
.reindex(timesteps, method='ffill')for robust timestamp alignment, addressing previous review feedback. The BDEW profile integration and system configuration look appropriate.
143-288: LGTM!The complex system implementation properly uses
.reindex()for weather and electricity price data alignment. The multi-carrier system with piecewise CHP efficiency and investment parameters is well-structured.
291-413: LGTM!The district heating system uses
.reindex()for robust data alignment and provides a comprehensive MW-scale configuration with investment optimization. The docstring clearly documents usage by specific notebooks.
416-521: LGTM!The 15-minute resolution data is correctly upsampled using
.reindex(timesteps, method='ffill'), addressing the previous review concern about timestamp resolution mismatch. The operational system configuration with startup costs and fixed capacities is appropriate for rolling horizon optimization.
552-648: LGTM!The seasonal storage configuration with solar thermal, gas backup, and large pit storage (50,000 MWh) is well-designed for demonstrating inter-cluster storage value. The
initial_charge_state='equals_final'ensures yearly cyclic operation.
651-765: LGTM!The multiperiod system correctly uses
.reindex()for data alignment and provides a comprehensive multi-period, multi-scenario configuration. The main function properly integrates all six system generators..github/workflows/docs.yaml (4)
60-66: Word-splitting vulnerability resolved.The null-delimited pipeline correctly addresses the previous critical shell safety issue. Using
-print0,sort -z, andxargs -0prevents word splitting and glob expansion on filenames with spaces or special characters.
68-73: Cache path now matches subdirectories.The recursive glob pattern
docs/notebooks/**/*.ipynbcorrectly resolves the previous issue. Notebooks in subdirectories will now be cached and restored properly.
75-82: Parallel notebook execution is well-implemented.The implementation correctly:
- Uses null-delimited find/xargs for filename safety
- Runs 4 notebooks in parallel (
-P 4) for efficiency- Changes to
docs/notebooksfor proper relative imports- Uses the
sh -c '$1'pattern to safely pass filenames- Propagates failures with
exit 255to stop xargs on error
84-87: Notebook execution disabled during build/deploy is correct.Setting
MKDOCS_JUPYTER_EXECUTE: "false"appropriately prevents duplicate execution since notebooks are pre-executed in the previous step. The deploy job correctly replicates the notebook caching and execution steps from the build job, which is necessary since these jobs can run independently.Also applies to: 124-153
flixopt/clustering/base.py (1)
1-1171: Nice implementation of the clustering API!The clustering module provides a comprehensive and well-structured API for time-series aggregation. The handling of multi-dimensional data with period/scenario dimensions is thorough, and the plotting capabilities via
ClusteringPlotAccessoradd excellent visualization support. The dataclass-based design is clean and the separation of concerns betweenClusterStructure,ClusterResult, andClusteringis well thought out.
| unique_clusters = np.unique(cluster_order) | ||
| occurrences = {} | ||
| for c in unique_clusters: | ||
| occurrences[int(c)] = sum(1 for x in cluster_order if x == c) | ||
|
|
||
| n_clusters = len(unique_clusters) | ||
| cluster_occurrences_da = xr.DataArray( | ||
| [occurrences.get(c, 0) for c in range(n_clusters)], | ||
| dims=['cluster'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix cluster occurrence mapping to handle non-contiguous cluster IDs.
If cluster_order contains non-contiguous cluster IDs (e.g., [0, 1, 3] instead of [0, 1, 2]), the current logic will produce incorrect results:
unique_clusters = [0, 1, 3]n_clusters = len(unique_clusters) = 3occurrences = {0: count_0, 1: count_1, 3: count_3}- Loop over
range(3) = [0, 1, 2]attempts to accessoccurrences[2], which doesn't exist, returning0 - Result:
cluster_occurrences_da = [count_0, count_1, 0]instead of[count_0, count_1, count_3]
While most clustering backends produce contiguous IDs starting from 0, the function should either enforce this assumption or handle gaps correctly.
🔎 Proposed fix to iterate over actual cluster IDs
# Count occurrences of each cluster
unique_clusters = np.unique(cluster_order)
+ n_clusters = len(unique_clusters)
+
+ # Ensure cluster IDs are contiguous 0..n-1
+ if not np.array_equal(unique_clusters, np.arange(n_clusters)):
+ raise ValueError(
+ f'Cluster IDs must be contiguous starting from 0. Found: {unique_clusters.tolist()}'
+ )
+
occurrences = {}
for c in unique_clusters:
occurrences[int(c)] = sum(1 for x in cluster_order if x == c)
- n_clusters = len(unique_clusters)
cluster_occurrences_da = xr.DataArray(
[occurrences.get(c, 0) for c in range(n_clusters)],
dims=['cluster'],
name='cluster_occurrences',
)Or, to support non-contiguous IDs:
# Count occurrences of each cluster
unique_clusters = np.unique(cluster_order)
- occurrences = {}
- for c in unique_clusters:
- occurrences[int(c)] = sum(1 for x in cluster_order if x == c)
-
n_clusters = len(unique_clusters)
+
+ # Use actual cluster IDs as coordinates
+ occurrences_list = [sum(1 for x in cluster_order if x == c) for c in unique_clusters]
cluster_occurrences_da = xr.DataArray(
- [occurrences.get(c, 0) for c in range(n_clusters)],
- dims=['cluster'],
+ occurrences_list,
+ dims=['cluster'],
+ coords={'cluster': unique_clusters},
name='cluster_occurrences',
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| unique_clusters = np.unique(cluster_order) | |
| occurrences = {} | |
| for c in unique_clusters: | |
| occurrences[int(c)] = sum(1 for x in cluster_order if x == c) | |
| n_clusters = len(unique_clusters) | |
| cluster_occurrences_da = xr.DataArray( | |
| [occurrences.get(c, 0) for c in range(n_clusters)], | |
| dims=['cluster'], | |
| unique_clusters = np.unique(cluster_order) | |
| n_clusters = len(unique_clusters) | |
| # Ensure cluster IDs are contiguous 0..n-1 | |
| if not np.array_equal(unique_clusters, np.arange(n_clusters)): | |
| raise ValueError( | |
| f'Cluster IDs must be contiguous starting from 0. Found: {unique_clusters.tolist()}' | |
| ) | |
| occurrences = {} | |
| for c in unique_clusters: | |
| occurrences[int(c)] = sum(1 for x in cluster_order if x == c) | |
| cluster_occurrences_da = xr.DataArray( | |
| [occurrences.get(c, 0) for c in range(n_clusters)], | |
| dims=['cluster'], | |
| name='cluster_occurrences', | |
| ) |
| unique_clusters = np.unique(cluster_order) | |
| occurrences = {} | |
| for c in unique_clusters: | |
| occurrences[int(c)] = sum(1 for x in cluster_order if x == c) | |
| n_clusters = len(unique_clusters) | |
| cluster_occurrences_da = xr.DataArray( | |
| [occurrences.get(c, 0) for c in range(n_clusters)], | |
| dims=['cluster'], | |
| unique_clusters = np.unique(cluster_order) | |
| n_clusters = len(unique_clusters) | |
| # Use actual cluster IDs as coordinates | |
| occurrences_list = [sum(1 for x in cluster_order if x == c) for c in unique_clusters] | |
| cluster_occurrences_da = xr.DataArray( | |
| occurrences_list, | |
| dims=['cluster'], | |
| coords={'cluster': unique_clusters}, | |
| name='cluster_occurrences', | |
| ) |
🤖 Prompt for AI Agents
In @flixopt/clustering/base.py around lines 1142 - 1150, The code builds
cluster_occurrences_da by assuming cluster IDs are contiguous 0..n_clusters-1
which breaks when cluster_order contains gaps; change the logic to iterate over
the actual unique cluster IDs rather than range(n_clusters): compute
unique_clusters = np.unique(cluster_order) (or ensure it is sorted), build
occurrences for each unique ID (using occurrences.get or Counter), then
construct cluster_occurrences_da from [occurrences[c] for c in unique_clusters]
(and set the cluster coordinate labels to unique_clusters) so
cluster_occurrences_da aligns with the true cluster IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI Agents
In @flixopt/dataset_plot_accessor.py:
- Around line 757-780: The loop over data variables in the duration-curve logic
can crash when a variable lacks a 'time' dimension because da.dims.index('time')
raises ValueError; update the loop in the duration curve method (where sorted_ds
is created and each var is processed and assigned back) to skip non-time
variables (e.g., check if 'time' in da.dims and continue for others) so scalars
or differently-dimensioned vars are left untouched, and ensure you only
replace/assign sorted values for variables that contain the 'time' dimension to
preserve other data_vars and coordinates.
🧹 Nitpick comments (3)
docs/notebooks/08b-rolling-horizon.ipynb (1)
54-56: Consider clarifying the resampling in the description.The description mentions "two weeks at 15-min resolution," but Line 68 immediately resamples the data to 1h resolution. While technically accurate about the source data, readers might be confused about what resolution the notebook actually uses.
🔎 Optional clarification
- "We use an operational district heating system with real-world data (two weeks at 15-min resolution):" + "We use an operational district heating system with real-world data (two weeks, resampled to 1h resolution):"flixopt/dataset_plot_accessor.py (2)
811-869: DataArray methods missingcolorparameter present in Dataset methods.The Dataset
barandstacked_barmethods have acolorparameter (line 209, 274) for dimension-based color grouping, but the DataArray versions don't expose this parameter. Users would need to convert to Dataset manually to access this feature.🔎 Proposed fix for `bar` method (apply similar pattern to others)
def bar( self, *, x: str | Literal['auto'] | None = 'auto', + color: str | Literal['auto'] | None = 'auto', colors: ColorType | None = None, title: str = '', xlabel: str = '', ylabel: str = '', facet_col: str | Literal['auto'] | None = 'auto', facet_row: str | Literal['auto'] | None = 'auto', animation_frame: str | Literal['auto'] | None = 'auto', facet_cols: int | None = None, exclude_dims: set[str] | None = None, **px_kwargs: Any, ) -> go.Figure: """Create a grouped bar chart. See DatasetPlotAccessor.bar for details.""" return self._to_dataset().fxplot.bar( x=x, + color=color, colors=colors, ... )
935-1016: Consider extracting shared heatmap logic to reduce duplication.The DataArray
heatmapimplementation (lines 958-1015) closely mirrors the Dataset version (lines 515-570). While this works correctly, extracting the shared logic into a private helper function would reduce maintenance burden.This is a minor code organization improvement that could be addressed in a future cleanup.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/notebooks/08b-rolling-horizon.ipynbflixopt/dataset_plot_accessor.pyflixopt/statistics_accessor.py
🧰 Additional context used
🧬 Code graph analysis (1)
flixopt/statistics_accessor.py (3)
flixopt/flow_system.py (5)
dims(1936-1956)weights(2115-2135)solution(1449-1466)solution(1469-1472)to_dataset(630-690)flixopt/dataset_plot_accessor.py (8)
stacked_bar(270-337)stacked_bar(841-869)heatmap(474-570)heatmap(935-1016)line(339-406)line(871-901)bar(205-268)bar(811-839)flixopt/config.py (2)
CONFIG(184-912)Plotting(550-592)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.14)
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: Build documentation
🔇 Additional comments (27)
docs/notebooks/08b-rolling-horizon.ipynb (6)
66-73: LGTM!The FlowSystem creation and data resampling logic is correct. The calculation accurately reflects 1h resolution (dividing by 24), and the description properly states "at 1h resolution."
93-102: LGTM!The full optimization setup is correct with appropriate solver parameters for a tutorial notebook.
114-126: LGTM!The rolling horizon explanation and ASCII diagram accurately represent the segmentation strategy with correct calculations (336 timesteps = 14 days, 96 timesteps = 4 days at 1h resolution).
140-141: LGTM!The rolling horizon parameters and inline comments are correct. The horizon=96 accurately represents 4-day segments at 1h resolution, and overlap=24 represents 1-day lookahead.
183-198: LGTM!The visualization section is clean and correctly uses the new
ComparisonAPI to compare both optimization methods side-by-side. The markdown is properly formatted.
217-367: LGTM!The remaining sections (storage continuity visualization, segment inspection, API reference, and summary) are well-structured and correct. The code follows best practices and the documentation is clear.
flixopt/dataset_plot_accessor.py (12)
1-16: LGTM!Imports are well-organized with clear separation between standard library, third-party, and local imports. The use of
from __future__ import annotationsenables modern type annotation syntax.
18-126: LGTM!The slot assignment logic is well-structured with clear two-pass resolution:
- First pass resolves explicit dimension assignments and marks them as used
- Second pass fills 'auto' slots from priority-ordered available dimensions
Good defensive programming with warnings for invalid dimensions and unassigned dimensions.
129-154: LGTM!The function correctly builds Plotly Express kwargs and handles
facet_col_wrapappropriately based on dimension size.
157-167: LGTM!The conversion handles edge cases well:
- Empty datasets return empty DataFrame
- Fully scalar datasets are handled specially to avoid melt issues
- Multi-dimensional data is correctly melted with dimension columns as id_vars
205-268: LGTM!The
barmethod correctly:
- Uses
assign_slotsfor dimension resolution- Handles empty DataFrames gracefully
- Builds color mapping with fallback to default colorscale
- Merges user-provided
px_kwargswith generated kwargs
270-337: LGTM!The
stacked_barmethod correctly implements relative stacking withbarmode='relative'and removes gaps withbargap=0, bargroupgap=0for proper visual stacking.
339-472: LGTM!The
lineandareamethods follow the established pattern consistently and properly handle theline_shapeparameter with CONFIG fallback.
474-570: LGTM!The
heatmapmethod correctly:
- Enforces single variable selection for multi-variable datasets
- Excludes heatmap axes (first 2 dims) from auto-assignment to facet/animation
- Squeezes unused singleton dimensions to satisfy
px.imshowrequirements- Uses
binary_string=Falseto handle string coordinate labels
572-641: LGTM!The
scattermethod correctly:
- Validates that both x and y variables exist in the dataset
- Uses wide-format DataFrame (not melted)
- Properly guards against using 'variable' slot since it doesn't exist in wide format
643-723: LGTM!The
piemethod handles both scalar and multi-dimensional cases appropriately and correctly documents thatpx.piedoesn't supportanimation_frame.
806-809: LGTM!Good fallback handling for unnamed DataArrays when converting to Dataset.
871-933: LGTM!The
lineandareamethods correctly delegate all parameters includingline_shapeandexclude_dimsto the Dataset accessor.flixopt/statistics_accessor.py (9)
129-170: LGTM! Well-structured heatmap preparation logic.The function correctly handles multiple cases:
- Clustered data using natural (time, cluster) dimensions
- Explicit reshape with validation
- Auto-reshape with intelligent slot availability checking to prevent data loss
- Proper fallback for edge cases
The nested helpers and clear branching make the logic easy to follow.
399-399: Correct usage of renamedtimestep_durationproperty.This change aligns with the PR's renaming of per-timestep primitives. The multiplication of flow_rates by timestep_duration correctly produces flow_hours (energy).
712-717: Proper cluster weighting for temporal effect aggregation.The logic correctly applies cluster weights when aggregating temporal effects to total effects. The use of
.get('cluster', 1.0)provides a safe fallback for non-clustered data, and the temporal dimension identification properly excludes period and scenario dimensions.
1305-1307: Well-implemented auto-dimension support for balance plotting.The signature changes and corresponding docstring updates correctly enable auto-dimension resolution for facets and animation. The parameters are properly typed and documented.
1397-1399: Consistent auto-dimension support for carrier_balance.The changes mirror the balance method and are correctly implemented.
1509-1567: Excellent refactoring of heatmap method.The refactoring delivers several improvements:
- Cleaner separation of concerns by delegating to
_prepare_for_heatmapandfxplot.heatmap- Improved clustered data handling with natural (time, cluster) dimensions
- Auto-reshape logic that intelligently prevents data loss
- More maintainable code with reduced complexity
The docstring clearly explains the new behavior for clustered data.
1696-1709: Good defensive check for empty dataset.The addition of the empty dataset check (line 1696) prevents errors when no data variables remain after filtering. The call to
fxplot.baris correctly structured.
1869-1943: Well-executed refactoring of effects plotting.The refactoring improves code quality through:
- Cleaner dictionary-based aspect selection (lines 1869-1873)
- Default groupby component behavior that makes sense for most use cases (lines 1886-1887)
- Clearer dimension aggregation logic organized by the 'by' parameter (lines 1892-1916)
- User override capability for color assignment (line 1923)
- Simplified plotting delegation to fxplot.bar
The complex aggregation logic is now more maintainable and easier to understand.
2071-2132: Impressive refactoring of storage dual-axis plotting.The refactoring successfully migrates to fxplot while maintaining complex dual-axis functionality:
Key improvements:
- Clean separation of flow data from charge state (lines 2071-2074)
- Delegation to fxplot for both bar and line charts
- Well-documented +100 offset strategy for secondary axes (line 2107) that prevents conflicts with plotly's auto-generated axis numbers
- Correct trace-to-axis mapping with proper handling of facets (lines 2124-2125)
- Proper legend configuration to avoid duplicate entries (lines 2130-2131)
The complexity is well-managed and the result is maintainable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
docs/notebooks/08b-rolling-horizon.ipynb (1)
328-333: Consider aligning API Reference example with notebook's resolution.The API Reference uses 15-min resolution examples (
horizon=192for 2 days,overlap=48for 12 hours), while the notebook uses 1h resolution throughout. For consistency, consider updating the API Reference to match:- horizon=192, # Timesteps per segment (e.g., 2 days at 15-min resolution) - overlap=48, # Additional lookahead timesteps (e.g., 12 hours) + horizon=96, # Timesteps per segment (e.g., 4 days at 1h resolution) + overlap=24, # Additional lookahead timesteps (e.g., 1 day)Alternatively, keep both examples but add a note that the values depend on the data's temporal resolution.
flixopt/clustering/base.py (1)
895-904: Validate dimensions for all variables, not just the first.The dimension validation checks only
resolved_variables[0](line 896), but the reshape loop at lines 917-930 iterates over all variables. If variables have inconsistent dimensions (e.g., first variable has onlytimedimension but second hastime, period), the check will pass but the reshape could fail on subsequent variables with a confusing error.🔎 Proposed fix to validate all variables
- # Check dimensions of aggregated data - sample_var = aggregated_data[resolved_variables[0]] - has_cluster_dim = 'cluster' in sample_var.dims - extra_dims = [d for d in sample_var.dims if d not in ('time', 'cluster')] - if extra_dims: + # Check dimensions of aggregated data - verify all variables have consistent dims + sample_var = aggregated_data[resolved_variables[0]] + has_cluster_dim = 'cluster' in sample_var.dims + sample_dims = set(sample_var.dims) + + # Verify all variables have the same dimensions + for var in resolved_variables[1:]: + var_dims = set(aggregated_data[var].dims) + if var_dims != sample_dims: + raise ValueError( + f'All variables must have the same dimensions. ' + f'{resolved_variables[0]!r} has {sorted(sample_dims)}, ' + f'but {var!r} has {sorted(var_dims)}.' + ) + + # Check for unsupported extra dimensions + extra_dims = [d for d in sample_dims if d not in ('time', 'cluster')] + if extra_dims: raise ValueError( f'clusters() requires data with only time (or cluster, time) dimensions. ' f'Found extra dimensions: {extra_dims}. ' f'Use select={{{extra_dims[0]!r}: <value>}} to select a specific {extra_dims[0]}.' )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/notebooks/08b-rolling-horizon.ipynbflixopt/clustering/base.py
🧰 Additional context used
🧬 Code graph analysis (1)
flixopt/clustering/base.py (3)
flixopt/plot_result.py (1)
PlotResult(20-143)flixopt/flow_system.py (2)
_create_reference_structure(590-628)to_dataset(630-690)flixopt/structure.py (3)
_create_reference_structure(492-541)_create_reference_structure(1168-1183)to_dataset(912-935)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.12)
- GitHub Check: test (3.13)
- GitHub Check: test (3.14)
- GitHub Check: test (3.11)
- GitHub Check: Build documentation
🔇 Additional comments (12)
docs/notebooks/08b-rolling-horizon.ipynb (6)
66-73: LGTM!The FlowSystem creation logic is correct. The source data is loaded at 15-min resolution and properly resampled to 1h. The timestep calculation (
len(timesteps) / 24) correctly computes days for hourly resolution data.
93-101: LGTM!The full optimization baseline is well-structured: appropriate solver parameters, proper use of
.copy()to preserve the original FlowSystem, and clear timing/cost reporting.
135-145: LGTM!The rolling horizon configuration is well-documented with accurate comments. The 4-day horizon (96 timesteps) and 1-day overlap (24 timesteps) are correctly calculated for 1h resolution data.
163-176: LGTM!Good defensive coding with the division-by-zero safeguard on line 165. When
cost_full == 0, returning0.0for the percentage gap is a reasonable fallback that prevents crashes while remaining semantically meaningful.
196-197: LGTM!The
fx.Comparisonclass is used appropriately to compare FlowSystem instances side-by-side, providing a clean visualization of cost contributions across methods.
217-232: LGTM!The storage state visualization correctly handles the charge_state array dimensions. The
[:-1]slicing appropriately aligns the n+1 charge states with the n timesteps for consistent plotting.flixopt/clustering/base.py (6)
1-32: LGTM! Clear module documentation and appropriate imports.The module docstring effectively explains the clustering terminology and scope, and the imports are well-organized with proper TYPE_CHECKING guards for type hints.
34-236: LGTM! Well-structured ClusterStructure class with proper multi-dimensional support.The class properly handles both simple and multi-dimensional (period/scenario) cases, with appropriate safeguards in
__repr__and helper methods for slicing.
238-536: LGTM! Comprehensive ClusterResult implementation with robust validation.The
expand_datamethod correctly handles both simple and multi-dimensional cases, and thevalidatemethod provides thorough consistency checks with appropriate warnings.
917-930: Reshape logic properly handles both flat and cluster-structured data.The implementation correctly branches on
has_cluster_dimto handle both(time,)and(cluster, time)layouts. Note that this relies on the dimension validation at lines 895-904, which currently only checks the first variable.
960-1117: LGTM! Well-designed Clustering container with convenient accessors.The class provides a clean API with proper delegation to nested objects, appropriate error handling, and helpful docstrings. The convenience properties make the clustering data easily accessible.
1169-1178: LGTM! Simple and effective registration function.The function cleanly registers the clustering classes in the global registry to support serialization while avoiding circular imports.
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
f5724cc to
a3453ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🤖 Fix all issues with AI Agents
In @docs/notebooks/03-investment-optimization.ipynb:
- Around line 233-240: The ratio calculation can divide by zero when solar_size
is 0; update the DataFrame construction to compute Ratio safely (e.g., compute
ratio = tank_size / solar_size if solar_size != 0 else float('inf') or np.nan or
None) and use that variable in the pd.DataFrame instead of directly using
tank_size / solar_size so the notebook won't raise ZeroDivisionError when
solar_size is zero.
In @flixopt/components.py:
- Around line 1256-1265: The code is still applying a linopy LP writer
workaround by adding "+ 0.0" and related comments when computing capacity
bounds; remove the workaround and comments: in the branch handling
isinstance(self.element.capacity_in_flow_hours, InvestParameters) replace the
returned values that currently use "-cap_max + 0.0, cap_max + 0.0" with
"-cap_max, cap_max" and in the else branch replace "-cap + 0.0, cap + 0.0" with
"-cap, cap", and delete the adjacent comments about the linopy bug; keep all
other logic (use of relative_upper_bound, cap_max computation, and
InvestParameters handling) unchanged.
In @flixopt/dataset_plot_accessor.py:
- Around line 742-780: The to_duration_curve method iterates all data_vars and
assumes each has a 'time' dimension, causing a crash for variables that lack
'time'; update to skip any DataArray without 'time' (e.g., in to_duration_curve,
check if 'time' in da.dims and continue for others) so only variables with a
time axis are sorted/replaced, and ensure subsequent size lookup n_timesteps and
assign_coords/rename operate on the preserved time-dimension dataset created
from those variables.
In @flixopt/features.py:
- Around line 234-236: The startup count constraint incorrectly sums
self.startup without applying cluster weights; update the constraint that
creates 'startup_count' (using startup_temporal_dims and the existing
add_constraints call) to multiply or weight startups by cluster_weight before
summing (e.g., use the model's sum_temporal helper or sum over dims on
self.startup * self.cluster_weight) so clustered periods contribute
startup_count scaled by cluster_weight consistent with active_hours and
ShareAllocationModel behavior.
🧹 Nitpick comments (18)
flixopt/features.py (1)
246-246: Consider using consistent property access pattern.Lines 246 and 259 use
self.timestep_duration, while Lines 272, 334, and 345 useself._model.timestep_duration. While both patterns likely work (Submodel may proxy these properties), consistency would improve code clarity.Optional: Standardize to self._model.timestep_duration
ModelingPrimitives.consecutive_duration_tracking( self, state=self.status, short_name='uptime', minimum_duration=self.parameters.min_uptime, maximum_duration=self.parameters.max_uptime, - duration_per_step=self.timestep_duration, + duration_per_step=self._model.timestep_duration, duration_dim='time', previous_duration=self._get_previous_uptime(), )ModelingPrimitives.consecutive_duration_tracking( self, state=self.inactive, short_name='downtime', minimum_duration=self.parameters.min_downtime, maximum_duration=self.parameters.max_downtime, - duration_per_step=self.timestep_duration, + duration_per_step=self._model.timestep_duration, duration_dim='time', previous_duration=self._get_previous_downtime(), )Also applies to: 259-259
docs/notebooks/data/generate_example_systems.py (2)
56-85: Lazy data loading is a solid improvement; tighten type hints and timezone handling.
- The lazy-loading pattern for
_weather/_elec_priceswith explicitFileNotFoundErrormessages is a good fix over the previous module‑level loading.- Minor:
_elec_pricesis annotated aspd.DataFrame | Nonebutload_electricity_prices()returns apd.Series. Consider updating the annotation and return type of_get_elec_prices()topd.Seriesto reflect actual usage.- The call
_elec_prices.index = _elec_prices.index.tz_localize(None)assumes a tz‑aware index; if the CSV ever ships without timezone info this will raise aTypeError. If you expect both, you could guard viaif _elec_prices.index.tz is not None:before stripping the tz.
291-413: Consider adding explicit carriers to the new district heating example for consistency.
create_district_heating_system()definesBus('Electricity'),Bus('Heat'),Bus('Gas'),Bus('Coal')without carriers, whereas the simpler examples explicitly add carriers (with units and colors). This works as long as downstream code doesn’t rely oncarriermetadata, but it makes plotting/statistics less self‑describing than the other systems.Consider adding
fs.add_carriers(...)with MW‑scale units (as increate_seasonal_storage_system) and assigning the correspondingcarrieron the buses to keep all example systems consistent.flixopt/statistics_accessor.py (2)
166-166: Consider explicit int conversion for clarity.The expression
slots = (facet_col == 'auto') + (animation_frame == 'auto')relies on implicit boolean-to-int conversion. While this works correctly in Python, explicitly converting to int or adding a comment might improve readability for future maintainers.🔎 Suggested refactor for clarity
- slots = (facet_col == 'auto') + (animation_frame == 'auto') + # Count how many facet/animation slots are available for auto-assignment + slots = int(facet_col == 'auto') + int(animation_frame == 'auto')
2103-2132: Consider adding comments to explain secondary axis strategy.The secondary y-axis creation logic (lines 2108-2120) is complex, using a
+100offset to avoid conflicts with plotly's auto-generated axis numbers. While functional, this approach would benefit from inline comments explaining:
- Why the +100 offset is necessary
- How the axis numbering maps between primary and secondary axes
- Any assumptions about plotly's axis naming conventions
This would help future maintainers understand and modify this logic if plotly's behavior changes.
tests/test_clustering/test_base.py (3)
15-46: Consider adding test forget_cluster_weight_per_timestep.The
TestClusterStructureclass tests basic creation and numpy array handling but doesn't test theget_cluster_weight_per_timestepbehavior mentioned in the AI summary. Consider adding a test to verify this method's correctness.Example test structure
def test_get_cluster_weight_per_timestep(self): """Test get_cluster_weight_per_timestep calculation.""" cluster_order = xr.DataArray([0, 1, 0], dims=['original_cluster']) cluster_occurrences = xr.DataArray([2, 1], dims=['cluster']) structure = ClusterStructure( cluster_order=cluster_order, cluster_occurrences=cluster_occurrences, n_clusters=2, timesteps_per_cluster=24, ) weights = structure.get_cluster_weight_per_timestep() # Add assertions based on expected behavior assert weights is not None
107-123: Consider adding edge case tests for cluster structure creation.While the basic test is correct, consider adding tests for edge cases:
- Invalid
timesteps_per_cluster(zero, negative)- Non-contiguous mapping values
- Empty or single-element mappings
These would improve robustness and catch potential validation issues.
125-141: Minimal test coverage for Clustering dataclass.The test only verifies basic creation and
backend_nameaccess. Consider adding tests for other properties mentioned in the AI summary:
plotaccessor (ClusteringPlotAccessor)- Aggregated properties:
cluster_order,occurrences,n_clusters, etc.- Mapping properties:
timestep_mapping,cluster_start_positionsThese would ensure the complete API surface is tested.
docs/user-guide/recipes/plotting-custom-data.md (1)
7-19: Consider adding a brief comment explaining whyflixoptis imported.The import
import flixopt as fxis required to register the.fxplotaccessor on xarray objects, but this isn't immediately obvious sincefxisn't used in the snippet. A brief comment would help readers understand why the import is necessary.🔎 Suggested clarification
```python -import flixopt as fx +import flixopt as fx # Registers .fxplot accessor on xarray objects import xarray as xrdocs/notebooks/08d-clustering-multiperiod.ipynb (1)
585-595: Notebook metadata includes kernel specification.Unlike other notebooks in this PR that have empty metadata (
"metadata": {}), this notebook includes a kernel specification. Consider clearing this for consistency to prevent execution environment issues across different setups.flixopt/optimization.py (1)
164-184: Consider aligning thenormalize_weightsdefault with the deprecation pattern.The
__init__signature still hasnormalize_weights: bool = True(line 169), but_initialize_optimization_commonnow acceptsbool | None = None(line 85). Since the parameter is deprecated and always forced toTrueinternally (line 110), consider changing the default toNonefor consistency, or adding a deprecation warning when the parameter is explicitly passed.🔎 Proposed fix
def __init__( self, name: str, flow_system: FlowSystem, folder: pathlib.Path | None = None, - normalize_weights: bool = True, + normalize_weights: bool | None = None, ): warnings.warn( f'Optimization is deprecated and will be removed in v{DEPRECATION_REMOVAL_VERSION}. ' 'Use FlowSystem.optimize(solver) or FlowSystem.build_model() + FlowSystem.solve(solver) instead. ' 'Access results via FlowSystem.solution.', DeprecationWarning, stacklevel=2, ) + if normalize_weights is not None: + warnings.warn( + f'normalize_weights parameter is deprecated and will be removed in {DEPRECATION_REMOVAL_VERSION}. ' + 'Scenario weights are now always normalized when set on FlowSystem.', + DeprecationWarning, + stacklevel=2, + ) _initialize_optimization_common(docs/notebooks/data/generate_realistic_profiles.py (2)
27-28: Clarify the purpose ofwarnings.resetwarnings().The comment "Reset to default behavior due to weird dependency behaviour" is vague. Consider adding more context about which dependency causes issues, or move this to the top of the file with an explanation. If this is working around a specific library issue, document it so future maintainers understand.
🔎 Proposed clarification
-warnings.resetwarnings() # Reset to default behavior due to weird dependency behaviour +# Reset warning filters to default. Some dependencies (demandlib, pvlib) modify +# the global warning state during import. This ensures consistent warning behavior. +warnings.resetwarnings()
165-170: Ensure timezone consistency between BDEW profile and timesteps.The code reindexes the BDEW profile using
method='ffill'. Iftimestepshas a different timezone awareness state (naive vs. timezone-aware) than the generated profile, this could cause silent alignment issues. Consider explicitly ensuring both are in the same timezone state, similar to the pattern used ingenerate_example_systems.pywhere timezone is explicitly removed from price data.flixopt/comparison.py (1)
222-228: Add error handling for cases where FlowSystems lack optimization data.The
_concat_propertymethod callsgetattr(fs.statistics, prop_name)without catching exceptions. If any FlowSystem hasn't been optimized, accessing its statistics will raise aRuntimeErrorfrom_require_solution(). Consider adopting the error handling pattern already used inComparisonStatisticsPlot._combine_data(lines 361-372) to skip missing systems and issue warnings instead of failing:for fs, name in zip(self._comp._systems, self._comp._names, strict=True): try: ds = getattr(fs.statistics, prop_name) datasets.append(ds.expand_dims(case=[name])) except RuntimeError as e: warnings.warn(f"Skipping case '{name}': {e}", stacklevel=3) continueflixopt/transform_accessor.py (1)
884-935: Cluster/expand pipeline looks solid;_combine_slices_to_dataarray_2dparams can be simplifiedThe
cluster()→expand_solution()flow, including use ofClusterStructure/ClusterResult, looks consistent for both 1D and multi‑(period, scenario) cases; I don’t see functional issues here. One small clean‑up:_combine_slices_to_dataarray_2dnever uses itscluster_coordsortime_coordsarguments, so they just add noise to the signature and call site.Proposed signature & call-site simplification
- def _combine_slices_to_dataarray_2d( - slices: dict[tuple, xr.DataArray], - original_da: xr.DataArray, - cluster_coords: np.ndarray, - time_coords: np.ndarray, - periods: list, - scenarios: list, - ) -> xr.DataArray: + def _combine_slices_to_dataarray_2d( + slices: dict[tuple, xr.DataArray], + original_da: xr.DataArray, + periods: list, + scenarios: list, + ) -> xr.DataArray: @@ - da = self._combine_slices_to_dataarray_2d( - slices=typical_das[name], - original_da=original_da, - cluster_coords=cluster_coords, - time_coords=time_coords, - periods=periods, - scenarios=scenarios, - ) + da = self._combine_slices_to_dataarray_2d( + slices=typical_das[name], + original_da=original_da, + periods=periods, + scenarios=scenarios, + )Also applies to: 1160-1205
flixopt/clustering/base.py (1)
1129-1176:create_cluster_structure_from_mappingsilently assumes contiguous cluster IDs
create_cluster_structure_from_mapping()derivesn_clustersasmax(cluster_order)+1and then buildscluster_occurrencesoverrange(n_clusters). If the mapping ever uses non‑contiguous IDs (e.g.[0, 1, 3]), this will introduce an artificial zero‑occurrence cluster (ID 2) and hide the real highest ID in the default coordinate range, which is confusing for debugging and plotting.Suggested handling that respects actual cluster IDs
- # Count occurrences of each cluster - unique_clusters = np.unique(cluster_order) - n_clusters = int(unique_clusters.max()) + 1 if len(unique_clusters) > 0 else 0 - occurrences = {} - for c in unique_clusters: - occurrences[int(c)] = sum(1 for x in cluster_order if x == c) - - cluster_occurrences_da = xr.DataArray( - [occurrences.get(c, 0) for c in range(n_clusters)], - dims=['cluster'], - name='cluster_occurrences', - ) + # Count occurrences of each actually used cluster ID + unique_clusters = np.unique(cluster_order) + occurrences = [np.count_nonzero(cluster_order == c) for c in unique_clusters] + + cluster_occurrences_da = xr.DataArray( + occurrences, + dims=['cluster'], + coords={'cluster': unique_clusters}, + name='cluster_occurrences', + )This keeps the structure valid even if upstream mappings produce non‑contiguous cluster labels.
tests/test_cluster_reduce_expand.py (2)
168-174: Consider extracting the magic number 193 to a constant or computed value.The value
193(192 + 1 extra timestep) appears in multiple assertions throughout the file. If the "extra timestep" behavior changes, multiple tests would need updating.🔎 Proposed refactor
+# At top of file or in a fixture +TIMESTEPS_8_DAYS = 192 +TIMESTEPS_WITH_EXTRA = TIMESTEPS_8_DAYS + 1 # Extra timestep for boundary values def test_expand_solution_enables_statistics_accessor(solver_fixture, timesteps_8_days): # ... flow_rates = fs_expanded.statistics.flow_rates assert 'Boiler(Q_th)' in flow_rates - assert len(flow_rates['Boiler(Q_th)'].coords['time']) == 193 # 192 + 1 extra timestep + assert len(flow_rates['Boiler(Q_th)'].coords['time']) == TIMESTEPS_WITH_EXTRA
595-637: Move thexarrayimport to the top of the file.The
import xarray as xron line 599 is placed inside the function. Sincexarrayis likely already a dependency offlixopt, moving it to the top improves consistency and avoids repeated import overhead if this function is called multiple times.🔎 Proposed refactor
import numpy as np import pandas as pd import pytest from numpy.testing import assert_allclose +import xarray as xr import flixopt as fxdef create_system_with_periods_and_scenarios( timesteps: pd.DatetimeIndex, periods: pd.Index, scenarios: pd.Index ) -> fx.FlowSystem: """Create a FlowSystem with both periods and scenarios.""" - import xarray as xr - hours = len(timesteps)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
docs/notebooks/data/raw/electricity_prices_de_2020.csvis excluded by!**/*.csvdocs/notebooks/data/raw/tmy_dresden.csvis excluded by!**/*.csv
📒 Files selected for processing (74)
.github/workflows/docs.yaml.pre-commit-config.yamlCHANGELOG.mddocs/notebooks/02-heat-system.ipynbdocs/notebooks/03-investment-optimization.ipynbdocs/notebooks/04-operational-constraints.ipynbdocs/notebooks/05-multi-carrier-system.ipynbdocs/notebooks/06a-time-varying-parameters.ipynbdocs/notebooks/07-scenarios-and-periods.ipynbdocs/notebooks/08a-aggregation.ipynbdocs/notebooks/08b-rolling-horizon.ipynbdocs/notebooks/08c-clustering.ipynbdocs/notebooks/08c2-clustering-storage-modes.ipynbdocs/notebooks/08d-clustering-multiperiod.ipynbdocs/notebooks/08e-clustering-internals.ipynbdocs/notebooks/09-plotting-and-data-access.ipynbdocs/notebooks/data/__init__.pydocs/notebooks/data/generate_example_systems.pydocs/notebooks/data/generate_realistic_profiles.pydocs/notebooks/data/raw/README.mddocs/notebooks/data/tutorial_data.pydocs/notebooks/fxplot_accessor_demo.ipynbdocs/user-guide/optimization/clustering.mddocs/user-guide/optimization/index.mddocs/user-guide/recipes/plotting-custom-data.mddocs/user-guide/results-plotting.mddocs/user-guide/results/index.mdflixopt/__init__.pyflixopt/clustering.pyflixopt/clustering/__init__.pyflixopt/clustering/base.pyflixopt/clustering/intercluster_helpers.pyflixopt/comparison.pyflixopt/components.pyflixopt/config.pyflixopt/core.pyflixopt/dataset_plot_accessor.pyflixopt/effects.pyflixopt/elements.pyflixopt/features.pyflixopt/flow_system.pyflixopt/io.pyflixopt/optimization.pyflixopt/optimize_accessor.pyflixopt/plot_result.pyflixopt/results.pyflixopt/statistics_accessor.pyflixopt/structure.pyflixopt/transform_accessor.pymkdocs.ymlpyproject.tomltests/deprecated/examples/03_Optimization_modes/example_optimization_modes.pytests/deprecated/test_bus.pytests/deprecated/test_effect.pytests/deprecated/test_flow.pytests/deprecated/test_flow_system_resample.pytests/deprecated/test_integration.pytests/deprecated/test_linear_converter.pytests/deprecated/test_scenarios.pytests/deprecated/test_storage.pytests/test_bus.pytests/test_cluster_reduce_expand.pytests/test_clustering/__init__.pytests/test_clustering/test_base.pytests/test_clustering/test_integration.pytests/test_clustering_io.pytests/test_effect.pytests/test_flow.pytests/test_flow_system_resample.pytests/test_io_conversion.pytests/test_linear_converter.pytests/test_scenarios.pytests/test_sel_isel_single_selection.pytests/test_storage.py
💤 Files with no reviewable changes (2)
- flixopt/clustering.py
- flixopt/effects.py
✅ Files skipped from review due to trivial changes (2)
- docs/notebooks/fxplot_accessor_demo.ipynb
- tests/test_clustering/init.py
🚧 Files skipped from review as they are similar to previous changes (24)
- .pre-commit-config.yaml
- docs/notebooks/08c2-clustering-storage-modes.ipynb
- flixopt/io.py
- tests/deprecated/test_effect.py
- docs/notebooks/02-heat-system.ipynb
- mkdocs.yml
- docs/user-guide/results-plotting.md
- flixopt/plot_result.py
- docs/notebooks/data/tutorial_data.py
- tests/test_flow_system_resample.py
- tests/test_flow.py
- tests/test_scenarios.py
- docs/user-guide/optimization/clustering.md
- tests/test_io_conversion.py
- flixopt/elements.py
- tests/deprecated/test_linear_converter.py
- tests/deprecated/test_flow.py
- flixopt/clustering/init.py
- tests/deprecated/test_scenarios.py
- tests/deprecated/test_flow_system_resample.py
- tests/deprecated/test_storage.py
- tests/deprecated/test_bus.py
- tests/test_clustering_io.py
- docs/notebooks/08e-clustering-internals.ipynb
🧰 Additional context used
🧬 Code graph analysis (12)
flixopt/clustering/intercluster_helpers.py (2)
flixopt/interface.py (1)
InvestParameters(956-1279)flixopt/flow_system.py (3)
FlowSystem(55-2510)coords(2001-2019)dims(1936-1956)
tests/test_linear_converter.py (2)
flixopt/results.py (4)
variables(335-339)variables(1260-1268)constraints(342-346)constraints(1271-1279)flixopt/structure.py (4)
variables(1717-1723)timestep_duration(220-222)timestep_duration(1749-1750)constraints(1708-1714)
docs/notebooks/data/generate_example_systems.py (1)
docs/notebooks/data/generate_realistic_profiles.py (8)
ElectricityLoadGenerator(128-170)GasPriceGenerator(234-262)ThermalLoadGenerator(68-125)load_electricity_prices(53-62)load_weather(37-50)generate(87-125)generate(143-170)generate(237-262)
flixopt/features.py (6)
flixopt/flow_system.py (6)
temporal_weight(1990-1998)temporal_dims(1980-1987)sum_temporal(2137-2154)dims(1936-1956)isel(2371-2397)weights(2115-2135)flixopt/structure.py (9)
temporal_weight(255-257)temporal_dims(247-252)sum_temporal(259-265)dims(229-231)timestep_duration(220-222)timestep_duration(1749-1750)weights(239-244)get(1656-1661)get(1821-1823)flixopt/modeling.py (1)
expression_tracking_variable(200-242)flixopt/statistics_accessor.py (2)
effects(1094-1185)effects(1827-1943)flixopt/effects.py (1)
add_share_to_effects(647-667)flixopt/transform_accessor.py (1)
isel(123-163)
tests/test_effect.py (1)
flixopt/structure.py (2)
timestep_duration(220-222)timestep_duration(1749-1750)
tests/test_storage.py (1)
flixopt/structure.py (2)
timestep_duration(220-222)timestep_duration(1749-1750)
flixopt/optimization.py (1)
flixopt/flow_system.py (2)
FlowSystem(55-2510)create_model(1333-1353)
flixopt/structure.py (1)
flixopt/flow_system.py (13)
FlowSystem(55-2510)solution(1449-1466)solution(1469-1472)coords(2001-2019)indexes(1959-1977)dims(1936-1956)weights(2115-2135)temporal_dims(1980-1987)temporal_weight(1990-1998)sum_temporal(2137-2154)scenario_weights(2053-2060)scenario_weights(2063-2102)_unit_weight(2104-2112)
flixopt/statistics_accessor.py (4)
flixopt/flow_system.py (6)
dims(1936-1956)weights(2115-2135)temporal_dims(1980-1987)solution(1449-1466)solution(1469-1472)to_dataset(630-690)flixopt/structure.py (9)
dims(229-231)timestep_duration(220-222)timestep_duration(1749-1750)weights(239-244)temporal_dims(247-252)solution(169-217)solution(1152-1166)variables(1717-1723)to_dataset(912-935)flixopt/dataset_plot_accessor.py (8)
stacked_bar(270-337)stacked_bar(841-869)heatmap(474-570)heatmap(935-1016)line(339-406)line(871-901)bar(205-268)bar(811-839)flixopt/config.py (2)
CONFIG(184-912)Plotting(550-592)
flixopt/flow_system.py (1)
flixopt/clustering/base.py (7)
Clustering(971-1126)_register_clustering_classes(1179-1188)clusters(838-967)_create_reference_structure(110-133)_create_reference_structure(309-338)_create_reference_structure(999-1012)timesteps_per_cluster(1080-1084)
flixopt/results.py (3)
flixopt/structure.py (4)
timestep_duration(220-222)timestep_duration(1749-1750)solution(169-217)solution(1152-1166)flixopt/flow_system.py (4)
FlowSystem(55-2510)calculate_timestep_duration(330-335)solution(1449-1466)solution(1469-1472)flixopt/statistics_accessor.py (1)
flow_rates(365-387)
flixopt/dataset_plot_accessor.py (2)
flixopt/color_processing.py (1)
process_colors(165-233)flixopt/config.py (2)
CONFIG(184-912)Plotting(550-592)
🪛 LanguageTool
docs/user-guide/results/index.md
[style] ~333-~333: Using many exclamation marks might seem excessive (in this case: 9 exclamation marks for a text that’s 4730 characters long)
Context: ...Works! cluster_boundary is ignored ``` !!! note "Component Differences" System...
(EN_EXCESSIVE_EXCLAMATION)
🪛 markdownlint-cli2 (0.18.1)
docs/notebooks/data/raw/README.md
10-10: Bare URL used
(MD034, no-bare-urls)
26-26: Bare URL used
(MD034, no-bare-urls)
28-28: Bare URL used
(MD034, no-bare-urls)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.12)
- GitHub Check: test (3.14)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
- GitHub Check: Build documentation
| if self.element.capacity_in_flow_hours is None: | ||
| return -np.inf, np.inf | ||
| elif isinstance(self.element.capacity_in_flow_hours, InvestParameters): | ||
| cap_max = self.element.capacity_in_flow_hours.maximum_or_fixed_size * relative_upper_bound | ||
| # Adding 0.0 converts -0.0 to 0.0 (linopy LP writer bug workaround) | ||
| return -cap_max + 0.0, cap_max + 0.0 | ||
| else: | ||
| cap = self.element.capacity_in_flow_hours * relative_upper_bound | ||
| # Adding 0.0 converts -0.0 to 0.0 (linopy LP writer bug workaround) | ||
| return -cap + 0.0, cap + 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's explore the repository structure and understand the context
git ls-files | grep -E "(requirements|setup|pyproject)" | head -10Repository: flixOpt/flixopt
Length of output: 73
🌐 Web query:
linopy LP writer negative zero -0.0 bug issue
💡 Result:
Short answer: this was a known LP-writer issue (LP files could contain -0.0 coefficients); it was fixed in linopy's LP-writing code (noted in release notes) and later improvements to the LP writer/IO were made. Solution: upgrade linopy to a current release and, when solving, leave sanitize_zeros=True (default) or write via the updated io_api. References and upgrade command below.
What to do
- Upgrade linopy: pip install --upgrade linopy. [1][5]
- The release notes record a specific fix for “-0.0” coefficients (v0.0.8) and further LP/IO improvements in later releases. Use a recent version (see release notes/changelog). [1]
- The Model.solve API includes sanitize_zeros (default True) which helps avoid zero-sign issues when writing problems. Use the updated to_file/io_api or the direct API to avoid old LP-writer bugs. [3][4]
References
- Linopy release notes (mentions fix for “-0.0” LP writing and later IO improvements). [1]
- Linopy Model.to_file documentation. [3]
- Linopy Model.solve documentation (sanitize_zeros parameter). [4]
- Linopy GitHub project page. [5]
- PyPSA announcement to update linopy after critical bugs in older versions. [2]
🏁 Script executed:
# Look for linopy version requirements in the project
fd -e "txt" -e "toml" -e "cfg" -e "yaml" -e "yml" | xargs grep -l "linopy" 2>/dev/null | head -5Repository: flixOpt/flixopt
Length of output: 73
🏁 Script executed:
# Check the context around the file mentioned in the review
cat -n flixopt/components.py | sed -n '1250,1270p'Repository: flixOpt/flixopt
Length of output: 1241
🏁 Script executed:
# Look for any related comments or issues in the codebase about linopy
rg "linopy" --type python -B2 -A2 | head -40Repository: flixOpt/flixopt
Length of output: 89
🏁 Script executed:
cat pyproject.toml | grep -A5 -B5 "linopy"Repository: flixOpt/flixopt
Length of output: 511
🏁 Script executed:
# Also search for any GitHub issues or comments in the code about this
rg "linopy.*bug|linopy.*0\.0" --type toml --type python -iRepository: flixOpt/flixopt
Length of output: 89
Remove the linopy LP writer bug workaround—it's fixed in required version 0.5.1.
The project requires linopy >= 0.5.1, but the -0.0 LP writer bug was fixed in linopy v0.0.8. The + 0.0 workaround is no longer necessary and adds unnecessary code noise. Modern linopy handles this via sanitize_zeros=True (default). Remove the workaround and the associated comments at lines 1260-1261 and 1264-1265.
🤖 Prompt for AI Agents
In @flixopt/components.py around lines 1256 - 1265, The code is still applying a
linopy LP writer workaround by adding "+ 0.0" and related comments when
computing capacity bounds; remove the workaround and comments: in the branch
handling isinstance(self.element.capacity_in_flow_hours, InvestParameters)
replace the returned values that currently use "-cap_max + 0.0, cap_max + 0.0"
with "-cap_max, cap_max" and in the else branch replace "-cap + 0.0, cap + 0.0"
with "-cap, cap", and delete the adjacent comments about the linopy bug; keep
all other logic (use of relative_upper_bound, cap_max computation, and
InvestParameters handling) unchanged.
| Case | Layout |
|------------------|-------------------------------------------------------------------------|
| Simple (no dims) | Single heatmap |
| Periods only | Vertical stack (1 column) |
| Scenarios only | Horizontal layout |
| Both | Grid (periods as rows, scenarios as columns via facet_cols=n_scenarios) |
| File | Issue | Fix |
|-------------------------------------------------|-----------------------------------------------------------|-----------------------------------------------------|
| docs/notebooks/03-investment-optimization.ipynb | Division by zero when solar_size = 0 | Added guard: if solar_size > 0 else float('nan') |
| flixopt/clustering/base.py:181-235 | ClusterStructure.plot() crashes for multi-period/scenario | Added NotImplementedError with helpful message |
| flixopt/components.py:1256-1265 | Obsolete linopy LP writer bug workaround | Removed + 0.0 workaround (fixed in linopy >= 0.5.1) |
| flixopt/dataset_plot_accessor.py:742-780 | to_duration_curve crashes on variables without time dim | Added guard to skip non-time variables |
| flixopt/features.py:234-236 | Critical: Startup count ignores cluster weighting | Now multiplies by cluster_weight before summing |
| flixopt/structure.py:268-281 | scenario_weights docstring misleading | Updated docstring to accurately describe behavior |
Nitpick Fixes
| File | Fix |
|----------------------------------------------------|-------------------------------------------------------------------------------------------|
| docs/notebooks/data/generate_example_systems.py | Fixed type hint pd.DataFrame → pd.Series for _elec_prices, added timezone guard |
| flixopt/statistics_accessor.py:2103-2132 | Added detailed comment explaining secondary y-axis offset strategy |
| tests/test_cluster_reduce_expand.py | Moved import xarray as xr to top of file |
| docs/notebooks/data/generate_realistic_profiles.py | Clarified warnings.resetwarnings() comment |
| flixopt/transform_accessor.py | Removed unused cluster_coords and time_coords params from _combine_slices_to_dataarray_2d |
| flixopt/comparison.py | Added error handling to _concat_property for FlowSystems lacking optimization data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flixopt/statistics_accessor.py (1)
1998-2143:storage()dual‑axis overlay handles facets correctly but animation frame propagation is incomplete.The implementation properly:
- Builds the flow stacked‑bar chart via
flow_ds.fxplot.stacked_bar(...)and constructs a separate line figure for the charge state- Creates secondary y‑axes (
yaxis101,yaxis102, …) overlaying the primary ones with well‑documented logic explaining the +100 offset strategy- Reuses
facet_col,facet_row, andanimation_framewhen building both figures for aligned subplot layoutsHowever, there is a functional issue with animation:
- Animation frames are not transferred from
line_fig.framestofig.frames. Whenanimation_frameis specified, both figures generate frame data, but only the bar chart's frames are retained; the charge‑state line traces remain static across animation frames. The stored frames inline_figmust be explicitly merged intofig.framesfor animated storage plots to work correctly—similar to the pattern seen inresults.pywhere charge state traces are added to each frame.The trace‑to‑facet mapping logic (
iindex toprimary_num) is sound withcolor=None(one trace per subplot), but adding a dedicated test exercisingstorage()withanimation_framewould help prevent regressions if the plotting internals change.
🤖 Fix all issues with AI Agents
In @flixopt/clustering/base.py:
- Around line 247-353: The validate() method compares representative weight
counts against self.cluster_structure.n_clusters but fails if n_clusters is an
xr.DataArray; cast n_clusters to a plain int before the comparison using the
same pattern as in __repr__ (i.e., if isinstance(n, (int, np.integer)) use
int(n) else use int(n.values)), then use that int for the weights_n_clusters
check; apply the same normalization wherever validate() or related checks
reference cluster_structure.n_clusters (and the other places noted) so
comparisons never yield an xr.DataArray.
- Around line 68-109: get_cluster_occurrences_for_slice can fail when
cluster_occurrences has extra dims (e.g., period/scenario) and callers omit
indices because occ.sel(cluster=c).values yields an array and int(...) raises;
update get_cluster_occurrences_for_slice to detect extra_dims = [d for d in
self.cluster_occurrences.dims if d != 'cluster'] and if extra_dims are present
and the caller did not supply indices for them, raise a clear ValueError (or
alternatively reduce over those dims via .sum(...) or .isel(...) before
converting to int) so conversion to int is safe; apply the same check/fix to the
other helper slices referenced (the blocks around the other occurrences handling
mentioned in the comment).
- Around line 847-977: clusters(): add an explicit size check before reshaping
in ClusteringPlotAccessor.clusters() — verify da.values.size == n_clusters *
timesteps_per_cluster and raise a clear ValueError that states the expected and
actual sizes (include variable name and current
n_clusters/timesteps_per_cluster) instead of relying on numpy.reshape to raise.
create_cluster_structure_from_mapping(): stop deriving n_clusters as max_id+1;
compute unique_clusters = np.unique(cluster_order), set n_clusters =
len(unique_clusters), and build cluster_occurrences as a DataArray over those
unique cluster IDs (use unique_clusters as the 'cluster' coordinate) so
sparse/non‑contiguous IDs are preserved (or alternatively add explicit
validation that IDs must be contiguous starting at 0 and raise if not) — update
usages of cs.cluster_occurrences and cs.n_clusters accordingly.
- Around line 717-845: In heatmap(), before repeating cluster_order into
cluster_per_timestep, validate that len(original_time) equals len(cluster_order)
* timesteps_per_period and raise a clear ValueError if not; specifically, locate
the loop that calls cs.get_cluster_order_for_slice and before
cluster_per_timestep = np.repeat(...) add the precondition check comparing
len(original_time) to len(cluster_order) * timesteps_per_period (using the
symbols original_time, cluster_order, timesteps_per_period) and raise an
informative error describing the mismatch and expected counts so the caller
knows the clustering horizon must cover whole periods.
In @flixopt/transform_accessor.py:
- Around line 952-1059: The per-slice occurrence builder
_build_cluster_occurrences_for_key and ClusterStructure/ClusterResult creation
should be made robust to non‑contiguous cluster IDs: either validate contiguity
in _build_cluster_occurrences_for_key by checking
cluster_occurrences_all[key].keys() against range(actual_n_clusters) and raise a
ValueError with a clear message, or compute actual_n_clusters from np.unique
over the emitted cluster IDs and build cluster coordinate/occurrence arrays from
that derived value (adjusting cluster_order_da/cluster_occurrences_da and
representative_weights accordingly); additionally, make ClusterResult.validate()
safe against xr.DataArray n_clusters by coercing
self.cluster_structure.n_clusters to int(...) before comparing with
weights_n_clusters so the comparison yields a plain int and avoids ambiguous
truth value errors.
🧹 Nitpick comments (7)
docs/notebooks/03-investment-optimization.ipynb (2)
283-290: Consider guarding divisions bytotal_costs.Lines 285-287 divide by
total_coststo compute percentages. In the tutorial context this is safe, but for defensive coding you could add a check to handle the edge case wheretotal_costsmight be zero or near-zero.🔎 Optional defensive guard
# Calculate cost components solar_invest = solar_size * SOLAR_COST_WEEKLY tank_invest = tank_size * TANK_COST_WEEKLY gas_costs = total_costs - solar_invest - tank_invest pd.DataFrame( { - 'Solar Investment': {'EUR': solar_invest, '%': solar_invest / total_costs * 100}, - 'Tank Investment': {'EUR': tank_invest, '%': tank_invest / total_costs * 100}, - 'Gas Costs': {'EUR': gas_costs, '%': gas_costs / total_costs * 100}, + 'Solar Investment': {'EUR': solar_invest, '%': solar_invest / total_costs * 100 if total_costs > 0 else 0}, + 'Tank Investment': {'EUR': tank_invest, '%': tank_invest / total_costs * 100 if total_costs > 0 else 0}, + 'Gas Costs': {'EUR': gas_costs, '%': gas_costs / total_costs * 100 if total_costs > 0 else 0}, 'Total': {'EUR': total_costs, '%': 100.0}, } )
348-362: Consider guarding division bygas_only_cost.Line 358 divides by
gas_only_costto compute savings percentage. While the tutorial data ensures non-zero demand, a defensive guard would handle edge cases more gracefully.🔎 Optional defensive guard
# Gas-only scenario for comparison total_demand = pool_demand.sum() gas_only_cost = total_demand / 0.92 * GAS_PRICE # All heat from gas boiler savings = gas_only_cost - total_costs pd.DataFrame( { 'Gas-only [EUR/week]': gas_only_cost, 'With Solar [EUR/week]': total_costs, 'Savings [EUR/week]': savings, - 'Savings [%]': savings / gas_only_cost * 100, + 'Savings [%]': savings / gas_only_cost * 100 if gas_only_cost > 0 else 0, 'Savings [EUR/year]': savings * 52, }, index=['Value'], ).Tdocs/notebooks/data/generate_example_systems.py (1)
544-547: Consider more robust weather data alignment.Lines 546-547 use direct
.valuesaccess without index alignment, unlike other functions which use.reindex(timesteps, method='ffill'). While this should work because TMY data has exactly 8760 hours, it's more fragile than the reindex pattern.🔎 Recommended refactor for consistency and robustness
# Full year, hourly (use non-leap year to match TMY data which has 8760 hours) timesteps = pd.date_range('2019-01-01', periods=8760, freq='h') - # Map to 2020 weather data (TMY has 8760 hours, no Feb 29) - temp = _get_weather()['temperature_C'].values - ghi = _get_weather()['ghi_W_m2'].values + # Get TMY weather data (8760 hours, no Feb 29) + weather = _get_weather() + temp = weather['temperature_C'].iloc[:8760].values + ghi = weather['ghi_W_m2'].iloc[:8760].valuesThis makes it explicit that we're taking the first 8760 values and adds a guard against unexpected data shapes.
flixopt/transform_accessor.py (2)
55-80: Clustering weight calculation: clarify semantics for user‑suppliedweights.The logic
clustering_weights = weights or self._calculate_clustering_weights(temporaly_changing_ds)treats an empty dict (weights={}) the same asNoneand silently falls back to auto‑weights. If you intend{}to mean “no weighting / all 1.0”, this could surprise users.Consider distinguishing explicitly:
None→ auto weights via_calculate_clustering_weights{}→ pass through as‑is (no special handling)For example:
clustering_weights = ( self._calculate_clustering_weights(temporaly_changing_ds) if weights is None else weights )This preserves explicit user intent while keeping current defaults.
578-882: Add defensive length validation forpredef_cluster_orderand reshape operation.The cluster pipeline is well-structured with good parameter validation and metrics handling. Three optional improvements for robustness:
When
predef_cluster_orderis anxr.DataArray, validate itsoriginal_clusterdimension length against the slice's time length (expected:time_length / timesteps_per_cluster). This prevents unclear tsam errors if the wrong-length array is passed.Before reshaping time-dependent constants (line ~920), add an explicit guard:
if sliced.sizes['time'] != actual_n_clusters * timesteps_per_cluster: raise ValueError(...). This protects against edge cases where the original dataset is smaller than expected, which would otherwise fail with a cryptic numpy reshape error.
representative_weightsandcluster_weightconsistency is already well-enforced inClusterResult.validate()(cluster dimension matching, weight sum checks), so no action needed here.flixopt/statistics_accessor.py (1)
129-171: Centralized heatmap preparation is a good refactor; no critical issues but clarification on sub-hourly support would help.The new
_prepare_for_heatmap()helper and updatedheatmap()method significantly improves clarity:
- Clustered datasets naturally render as
(time, cluster)heatmaps.- Explicit
reshape=(outer, inner)is always honored whentimeexists.- Auto mode for non‑clustered data reshapes only if extra dims fit into available facet/animation slots.
Two observations:
- fallback_dims() and missing time: For multi‑variable datasets you hardcode
['variable', 'time']without checking existence. Whilefinalize()filters out missing dims, this relies on defensive downstream logic rather than validating upfront. For non-time multi‑variable data, consider checking thattimeexists before defaulting to it, or documenting this assumption.- Sub-hourly support clarity: The docstring correctly documents that 'auto' uses
('D', 'h'), and_reshape_time_for_heatmap()does support finer frequencies like('D', '15min')and('h', 'min'). However, users may not realize they can pass explicitreshapetuples for sub-hourly data. Consider adding an example or note in the docstring showing how to use sub-hourly reshaping (e.g.,reshape=('D', '15min')).These are refinement suggestions; the core logic is sound.
flixopt/comparison.py (1)
136-146: Consider more robust coordinate comparison.The comparison
(ref_coords == fs_coords).all()at line 142 may fail in edge cases:
- Arrays with NaN values (NaN != NaN)
- Type mismatches (e.g., different datetime64 resolutions)
🔎 More robust comparison approach
if len(ref_coords) != len(fs_coords): logger.warning( f"Coordinate length mismatch for '{dim}' between '{ref_name}' ({len(ref_coords)}) " f"and '{name}' ({len(fs_coords)}). xarray will align on coordinates, " f'which may introduce NaN values for non-overlapping coordinates.' ) - elif not (ref_coords == fs_coords).all(): + elif not np.array_equal(ref_coords, fs_coords, equal_nan=True): logger.warning( f"Coordinate values differ for '{dim}' between '{ref_name}' and '{name}'. " f'xarray will align on coordinates, which may introduce NaN values.' )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
docs/notebooks/03-investment-optimization.ipynbdocs/notebooks/data/generate_example_systems.pydocs/notebooks/data/generate_realistic_profiles.pyflixopt/clustering/base.pyflixopt/comparison.pyflixopt/dataset_plot_accessor.pyflixopt/features.pyflixopt/statistics_accessor.pyflixopt/structure.pyflixopt/transform_accessor.pytests/test_cluster_reduce_expand.py
🚧 Files skipped from review as they are similar to previous changes (1)
- docs/notebooks/data/generate_realistic_profiles.py
🧰 Additional context used
🧬 Code graph analysis (4)
docs/notebooks/data/generate_example_systems.py (1)
docs/notebooks/data/generate_realistic_profiles.py (8)
ElectricityLoadGenerator(131-173)GasPriceGenerator(237-265)ThermalLoadGenerator(71-128)load_electricity_prices(56-65)load_weather(40-53)generate(90-128)generate(146-173)generate(240-265)
flixopt/structure.py (1)
flixopt/flow_system.py (13)
FlowSystem(55-2510)solution(1449-1466)solution(1469-1472)coords(2001-2019)indexes(1959-1977)dims(1936-1956)weights(2115-2135)temporal_dims(1980-1987)temporal_weight(1990-1998)sum_temporal(2137-2154)scenario_weights(2053-2060)scenario_weights(2063-2102)_unit_weight(2104-2112)
flixopt/comparison.py (4)
flixopt/config.py (2)
CONFIG(184-912)Plotting(550-592)flixopt/plot_result.py (2)
PlotResult(20-143)update_traces(75-88)flixopt/statistics_accessor.py (9)
flow_rates(365-387)flow_hours(390-411)sizes(440-444)sizes(1011-1054)sizes(1661-1716)carrier_units(328-336)plot(350-362)balance(1296-1386)heatmap(1509-1567)flixopt/dataset_plot_accessor.py (8)
stacked_bar(270-337)stacked_bar(844-872)line(339-406)line(874-904)bar(205-268)bar(814-842)heatmap(474-570)heatmap(938-1019)
flixopt/clustering/base.py (3)
flixopt/plot_result.py (1)
PlotResult(20-143)flixopt/flow_system.py (2)
dims(1936-1956)sel(2299-2325)flixopt/config.py (2)
CONFIG(184-912)Plotting(550-592)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
- GitHub Check: test (3.14)
- GitHub Check: test (3.12)
- GitHub Check: Build documentation
🔇 Additional comments (42)
docs/notebooks/03-investment-optimization.ipynb (3)
103-110: LGTM!The migration to fxplot for visualization is clean and follows the pattern established across other notebooks in this PR.
233-240: Division by zero guard implemented correctly.The guard
if solar_size > 0 else float('nan')properly addresses the division by zero issue flagged in the previous review.
85-93: Data loading is correctly implemented. Theget_investment_data()function exists indocs/notebooks/data/tutorial_data.pyand returns all required keys:timesteps,solar_profile,pool_demand,gas_price,solar_cost_per_kw_week, andtank_cost_per_kwh_week.docs/notebooks/data/generate_example_systems.py (9)
20-43: LGTM: Robust import handling for multiple execution contexts.The try/except pattern correctly handles relative imports for package context and falls back to path manipulation for mkdocs-jupyter execution context.
80-82: LGTM: Robust timezone handling.The guard
if _elec_prices.index.tz is not None:prevents errors when the price data is already timezone-naive, making the code more robust across different data sources.
90-142: LGTM: Proper timestamp alignment and BDEW profile integration.The function correctly uses
.reindex(timesteps, method='ffill')for robust timestamp alignment and integrates BDEW profiles appropriately.
145-290: LGTM: Well-structured complex system with realistic profiles.The function demonstrates proper use of time-varying effects (CO2 factors based on hour of day), piecewise efficiency curves, and investment optimization. Timestamp alignment is handled correctly throughout.
293-415: LGTM: District heating system with proper MW-scale profiles and investment parameters.The function correctly scales BDEW profiles to MW (district heating scale) and defines investment parameters for CHP, boiler, and storage. Note that carriers are implicitly inferred from Bus definitions rather than explicitly added via
fs.add_carriers(), which is inconsistent with some other functions but valid.
418-523: LGTM: Operational system with proper startup cost modeling.The function is well-designed for operational planning with 15-minute resolution, startup costs on CHP and boiler,
previous_flow_rateparameters for startup/shutdown modeling, andprevent_simultaneous_charge_and_discharge=Trueon storage for realistic operation.
622-643: LGTM: Realistic seasonal pit storage configuration.The seasonal storage configuration correctly uses:
- Large capacity (50,000 MWh) for seasonal shifting
- Very low losses (0.0001/hour) typical of pit storage
initial_charge_state='equals_final'for yearly cyclic boundary- Separate investment parameters for capacity, charging, and discharging flows
This design appropriately demonstrates seasonal storage value for clustering tutorials.
653-741: LGTM: Well-structured multi-period and multi-scenario system.The function properly demonstrates:
- Multiple planning periods with period-varying costs
- Multiple scenarios with scenario-specific demand profiles and weights
- Appropriate 2-week duration for clustering demonstrations
- Investment parameters that carry across periods
744-767: LGTM: Complete main() function with all system generators.The function correctly registers all six systems (three original + three new) and follows the standard pattern for saving FlowSystem instances to netCDF.
flixopt/features.py (6)
199-210: LGTM: Temporal weight-based active hours tracking.The refactoring from
hours_per_steptotemporal_weightandsum_temporalcorrectly supports clustered systems where temporal weights account for both timestep duration and cluster representation. Usingtotal_hoursas the default upper bound is appropriate.
234-239: Approved: Cluster weighting correctly applied to startup count.This implementation correctly addresses the critical issue raised in previous reviews. The fix ensures that startups in clusters with weight > 1 are properly counted (e.g., a startup in a cluster with weight 10 now correctly counts as 10 startups). The use of
weights.get('cluster', 1.0)safely handles both clustered and non-clustered systems.
269-279: LGTM: Effects correctly scaled by timestep duration.The implementation properly scales per-hour effects by
timestep_durationto convert them to per-timestep effects. The docstring correctly clarifies that cluster weighting is applied downstream during aggregation, maintaining separation of concerns.
619-630: LGTM: Consistent cluster weighting in share allocation.The implementation correctly:
- Converts per-hour bounds to per-timestep bounds by multiplying by
timestep_duration(Lines 619-620)- Applies cluster weighting before aggregating to total (Line 629)
- Uses safe defaults for non-clustered systems via
weights.get('cluster', 1.0)This pattern is consistent with the startup count implementation and properly handles clustered systems.
249-249: Propertytimestep_durationis correctly accessible onSubmodel—no issue found.
Submodeldefinestimestep_durationas a property (structure.py, lines 1750–1752) that forwards toself._model.timestep_duration. Both access patterns inStatusModel(lines 249, 262 usingself.timestep_durationand lines 275, 337, 348 usingself._model.timestep_duration) are valid and functionally equivalent.
337-352: Clarify whethertimestep_durationreceives period/scenario dimensions.Both methods use
.isel(time=0).min().item()to extract a scalar. The.min()operation suggeststimestep_durationmay have multiple dimensions afterfit_to_model_coords(), though the docstring indicates the system is "currently time, but extensible." In practice,timestep_durationappears to remain 1D (time only), making.min()unnecessary. If it does have multiple dimensions in some cases,.min()provides a conservative estimate rather than a dimension-aware one. Consider whether.isel(time=0).item()alone is sufficient, or if.min()is intentional to handle varying timestep durations across periods/scenarios.flixopt/dataset_plot_accessor.py (3)
18-167: Slot assignment and long‑form conversion look consistent with CONFIG semantics.
assign_slots+_build_fig_kwargs+_dataset_to_long_dfform a clean abstraction:
- Dimensions are filtered by size>1 and
exclude_dims, with explicit validation for user‑provided slot names.CONFIG.Plotting.dim_priorityandslot_priorityare respected, and unassigned dims trigger helpful warnings._dataset_to_long_dfcorrectly handles both scalar‑only datasets and multi‑dim ones by melting over dims (not all coords), which avoids surprising ID columns leaking in.This looks solid and should make future fxplot consumers easier to maintain.
742-783: Duration‑curve fix for non‑time variables is correct.The new guard:
for var in sorted_ds.data_vars: da = sorted_ds[var] if 'time' not in da.dims: continueprevents
da.dims.index('time')from crashing when scalar or non‑time variables are present. This matches the intended behavior (leave such variables untouched while sorting only true time series) and resolves the previously flagged issue.
474-571: Heatmap auto‑faceting for Dataset and DataArray is thoughtfully constrained.The dataset and dataarray
heatmapmethods:
- Require an explicit
variablewhen multiple data_vars exist, avoiding ambiguous plots.- Use
assign_slotsonly whenfacet_coloranimation_frameare'auto', and explicitly exclude the first two dims (heatmap axes) from auto‑assignment so they're not accidentally reused as facets.- Squeeze only singleton dims not used for axes/facets/animation, which is exactly what
px.imshowneeds to avoid shape errors.The DataArray variant's temporary dataset (
da.to_dataset(name='_temp')) is a neat way to reuse the same slot logic. Plotly Expressimshowfully supports xarray DataArray inputs withfacet_colandanimation_frameparameters, and can combine both for multi-dimensional data. Overall this should behave predictably across a variety of dimension layouts.flixopt/statistics_accessor.py (5)
1696-1716:sizes()plotting path: fallback behavior when no vars is graceful.The updated
sizes()logic:
- Returns an empty
Figurewhen there are no remainingdata_varsafter filtering (instead of erroring).- Otherwise uses
ds.fxplot.bar(x='variable', color='variable', ...), which will leverage the fxplot machinery to put each size as a separate bar and facet over any remaining dims if present.This is a nice UX improvement and keeps the API forgiving when filters eliminate all flows.
1868-1944: Effects plotting refactor is clearer; small grouping/selection order looks correct.The refactored
effects():
- Picks the correct dataset for
aspectvia a small dict, with a clear error for invalid values.- Applies effect selection, then groups by
componentunlessby='contributor', which matches the docstring.- Sums over the appropriate dims based on
byand choosesx_col/ default color dimension accordingly.The new logic around
color_coland allowing a user‑providedcoloroverride viaplotly_kwargs.pop('color', color_col)is flexible. Dim‑reduction order (time, thencomponent/contributorfor the aggregate case) also matches expectations.This looks good as‑is.
399-411:flow_hoursnow usestimestep_duration; type/shape alignment is correct.The change to compute
flow_hoursvia:hours = self._fs.timestep_duration da = flow_rates[var] * hoursis a sensible modernization.
timestep_durationis calculated per-timestep from the time index differences and automatically broadcasts across all model dimensions (cluster, period, scenario) via xarray's built-in multi-dimensional array operations. Unit transformation (kW→kWh) is handled correctly in the attribute processing.Clustering behavior has been verified in tests:
test_cluster_reduce_expand.pyconfirms thatflow_hoursmatches between clustered and expanded FlowSystems when properly weighted withcluster_weight, andtest_clustering/test_integration.pyvalidates thatweights['time']correctly returnstimestep_durationacross various scenarios.
708-717: The code is well-designed and the assumptions are sound. The implementation correctly handles both clustered and non-clustered models:
self._fs.weights.get('cluster', 1.0)safely returns a DataArray withclusterdimension when clustered, or scalar 1.0 when non-clustered- Broadcasting works correctly: cluster_weight dims align with solution[label] dims; contributor is added after aggregation, so no unintended axis pollution
- The pattern is consistent with other codebase usage (e.g., features.py uses identical
weights.get('cluster', 1.0)+ temporal aggregation)No validation is needed; the code is correct as-is.
1510-1567: Integration with fxplot is clean and edge case handling is deliberate.The new
heatmap()implementation correctly delegates dimension decisions to_prepare_for_heatmap(), which intelligently calculates available slots for faceting/animation and only skips reshaping when extra dimensions exceed available display space. For multi-variable data, it defaults to(variable, time)as heatmap axes—a sensible fallback. Clustered data uses(time, cluster)directly. Test coverage includes multi-dimensional scenarios, confirming dimension preservation works correctly. The docstring documents this behavior clearly. No backwards-compatibility concerns require action; layout changes are expected in this major release rewrite of the plotting system.flixopt/transform_accessor.py (1)
1201-1497: The time coordinate alignment between data and solution is working as intended. The expanded FlowSystem has data onoriginal_timesteps(192) and solution onoriginal_timesteps_extra(193), which is by design—the extra timestep is used for storage SOC boundary conditions. This is documented in the codebase (see the comment about suppressing linopy warnings for "expected" coordinate mismatches) and thoroughly tested:test_expand_solution_enables_statistics_accessorexplicitly validates thatflow_rateshas 193 timesteps and works correctly with the statistics accessor. No changes needed.flixopt/comparison.py (5)
74-100: LGTM! Clean initialization with proper validation.The constructor correctly validates:
- Minimum of 2 FlowSystems
- Unique case names
- All FlowSystems have solutions
- Dimension matching across systems
The fallback naming strategy (
fs.name or f'System {i}') is sensible.
171-192: LGTM! Bounds checking correctly implemented.The
diffmethod now properly validates integer reference indices (lines 186-188), including support for negative indices. This addresses the past review comment about missing bounds checks.
194-346: LGTM! Well-structured statistics aggregation.The
ComparisonStatisticsclass effectively aggregates statistics across cases:
- Lazy property evaluation with caching
- Graceful error handling in
_concat_property(skips failed cases with warnings)- Sensible merge strategy for dict properties (later cases override)
360-384: LGTM! Safe title extraction implemented.The
_combine_datamethod now safely handles title extraction (lines 371-373), addressing the past review comment about potentialAttributeError. The pattern correctly checks iffig_titleis not None before accessing.text.
386-657: LGTM! Consistent and well-structured plotting methods.All plotting methods follow a clean pattern:
- Combine data via
_combine_data- Handle empty datasets gracefully
- Delegate to
fxplotaccessors for visualization- Finalize with consistent show/return logic
The separation of concerns and consistent API design make the code maintainable and easy to extend.
flixopt/structure.py (6)
171-179: LGTM! Appropriate warning suppression with clear documentation.The warning suppression is well-implemented:
- Narrowly scoped to specific expected warning
- Uses context manager for limited scope
- Comment clearly explains why the warning is expected (extra timestep for charge_state)
214-217: LGTM! Correct conditional reindexing.The reindexing logic properly checks both:
- 'time' dimension exists
- Current index differs from timesteps_extra
This ensures consistency while avoiding unnecessary operations.
220-265: LGTM! Clean property delegation to FlowSystem.The new properties correctly delegate to
flow_system, providing:
- Dimension and index access
- Weight access with clear documentation
- Temporal aggregation helpers
The delegation pattern keeps the API consistent between FlowSystemModel and FlowSystem.
268-282: LGTM! Docstring now matches implementation.The docstring has been updated to accurately describe the three return cases:
- Scalar 1 if no scenarios
- Unit weights (all 1.0) if scenarios exist but no explicit weights
- Normalized explicit weights if set
This addresses the past review comment about docstring/behavior mismatch.
319-324: LGTM! Smart cluster dimension handling.The enhancement to automatically include 'cluster' when 'time' is requested in clustered systems is sensible. This prevents errors from missing the cluster dimension, since time and cluster are tightly coupled in the clustered data structure.
1751-1752: LGTM! Consistent property rename.The
timestep_durationproperty correctly delegates to_model.timestep_duration, maintaining consistency with the renamed property throughout the codebase.tests/test_cluster_reduce_expand.py (4)
1-226: LGTM! Comprehensive basic clustering tests.The test suite covers:
- Cluster creation and structure validation
- Solution expansion and timestep restoration
- Variable preservation across expansion
- Correct value mapping from clusters to expanded timesteps (thorough!)
- Statistics accessor functionality
- Total effects matching between clustered and expanded (with proper cluster_weight handling)
- Error handling for invalid operations
The tests are well-organized with clear assertions and good use of
assert_allclosefor numerical comparisons.
228-373: LGTM! Excellent multi-scenario clustering tests.The scenario tests are thorough and correctly verify:
- Scenario dimension preservation through clustering
- Full cluster → optimize → expand cycle with scenarios
- Independent mapping per scenario (lines 347-371) — this is particularly important and well-tested
The use of
get_cluster_order_for_slice(scenario=scenario)ensures each scenario's expansion uses its own cluster order, which is critical for correctness.
374-564: LGTM! Comprehensive storage clustering tests.The storage tests thoroughly cover all cluster modes:
independent: Each cluster starts fresh (no SOC_boundary)cyclic: Start equals end per cluster (no SOC_boundary)intercluster: SOC links across clusters (has SOC_boundary)intercluster_cyclic: Linked with yearly cycling (SOC_boundary with first == last)The expansion tests verify critical behavior:
- Charge state becomes non-negative after combining with SOC_boundary
- Self-discharge decay is applied correctly
- Manual calculation verification (lines 524-563) — excellent validation of the expansion formula!
566-835: LGTM! Complete coverage of advanced clustering features.The tests thoroughly cover:
Multi-period clustering:
- Period dimension preservation through cluster/optimize/expand cycle
- Mixed period + scenario systems (excellent integration test)
- Proper dimension handling in solutions
Peak selection:
time_series_for_high_peaksandtime_series_for_low_peaksparameter acceptance- Peak capture verification (lines 801-819) — confirms extreme demand days are included in typical days
- Comparison test showing difference between peak-aware and standard clustering
The test suite provides excellent coverage of the new clustering/aggregation functionality.
| cluster_order: xr.DataArray | ||
| cluster_occurrences: xr.DataArray | ||
| n_clusters: int | xr.DataArray | ||
| timesteps_per_cluster: int | ||
|
|
||
| def __post_init__(self): | ||
| """Validate and ensure proper DataArray formatting.""" | ||
| # Ensure cluster_order is a DataArray with proper dims | ||
| if not isinstance(self.cluster_order, xr.DataArray): | ||
| self.cluster_order = xr.DataArray(self.cluster_order, dims=['original_cluster'], name='cluster_order') | ||
| elif self.cluster_order.name is None: | ||
| self.cluster_order = self.cluster_order.rename('cluster_order') | ||
|
|
||
| # Ensure cluster_occurrences is a DataArray with proper dims | ||
| if not isinstance(self.cluster_occurrences, xr.DataArray): | ||
| self.cluster_occurrences = xr.DataArray( | ||
| self.cluster_occurrences, dims=['cluster'], name='cluster_occurrences' | ||
| ) | ||
| elif self.cluster_occurrences.name is None: | ||
| self.cluster_occurrences = self.cluster_occurrences.rename('cluster_occurrences') | ||
|
|
||
| def __repr__(self) -> str: | ||
| n_clusters = ( | ||
| int(self.n_clusters) if isinstance(self.n_clusters, (int, np.integer)) else int(self.n_clusters.values) | ||
| ) | ||
| # Handle multi-dimensional cluster_occurrences (with period/scenario dims) | ||
| occ_data = self.cluster_occurrences | ||
| extra_dims = [d for d in occ_data.dims if d != 'cluster'] | ||
| if extra_dims: | ||
| # Multi-dimensional: show shape info instead of individual values | ||
| occ_info = f'shape={dict(occ_data.sizes)}' | ||
| else: | ||
| # Simple case: list of occurrences per cluster | ||
| occ_info = [int(occ_data.sel(cluster=c).values) for c in range(n_clusters)] | ||
| return ( | ||
| f'ClusterStructure(\n' | ||
| f' {self.n_original_clusters} original periods → {n_clusters} clusters\n' | ||
| f' timesteps_per_cluster={self.timesteps_per_cluster}\n' | ||
| f' occurrences={occ_info}\n' | ||
| f')' | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi‑dimensional ClusterStructure handling is mostly robust; a small slice helper edge case remains.
The dataclass does a good job of:
- Normalizing
cluster_order/cluster_occurrencesinto named DataArrays. - Representing multi‑dim occurrences in
__repr__via shape instead of trying to coerce to scalars. - Providing
has_multi_dims,get_cluster_order_for_slice, andplot()that respect optional period/scenario dims.
One remaining edge case is get_cluster_occurrences_for_slice: if callers omit period/scenario while those dims are present, occurrences.sel(cluster=c) can still have extra dims and .values will be an array, so int(...) will fail. You may want to:
- Either require explicit period/scenario for multi‑dim structures (and raise a clear
ValueErrorwhen they are omitted), or - Reduce over extra dims (e.g.
.sum()or.isel(...)) before converting toint.
This would make the helper safer for external use without affecting existing internal call sites that already pass both indices.
Also applies to: 135-144, 162-180
🤖 Prompt for AI Agents
In @flixopt/clustering/base.py around lines 68 - 109,
get_cluster_occurrences_for_slice can fail when cluster_occurrences has extra
dims (e.g., period/scenario) and callers omit indices because
occ.sel(cluster=c).values yields an array and int(...) raises; update
get_cluster_occurrences_for_slice to detect extra_dims = [d for d in
self.cluster_occurrences.dims if d != 'cluster'] and if extra_dims are present
and the caller did not supply indices for them, raise a clear ValueError (or
alternatively reduce over those dims via .sum(...) or .isel(...) before
converting to int) so conversion to int is safe; apply the same check/fix to the
other helper slices referenced (the blocks around the other occurrences handling
mentioned in the comment).
| @dataclass | ||
| class ClusterResult: | ||
| """Universal result from any time series aggregation method. | ||
| This dataclass captures all information needed to: | ||
| 1. Transform a FlowSystem to use aggregated (clustered) timesteps | ||
| 2. Expand a solution back to original resolution | ||
| 3. Properly weight results for statistics | ||
| Attributes: | ||
| timestep_mapping: Maps each original timestep to its representative index. | ||
| dims: [original_time] for simple case, or | ||
| [original_time, period, scenario] for multi-period/scenario systems. | ||
| Values are indices into the representative timesteps (0 to n_representatives-1). | ||
| n_representatives: Number of representative timesteps after aggregation. | ||
| representative_weights: Weight for each representative timestep. | ||
| dims: [time] or [time, period, scenario] | ||
| Typically equals the number of original timesteps each representative covers. | ||
| Used as cluster_weight in the FlowSystem. | ||
| aggregated_data: Time series data aggregated to representative timesteps. | ||
| Optional - some backends may not aggregate data. | ||
| cluster_structure: Hierarchical clustering structure for storage linking. | ||
| Optional - only needed when using cluster() mode. | ||
| original_data: Reference to original data before aggregation. | ||
| Optional - useful for expand_solution(). | ||
| Example: | ||
| For 8760 hourly timesteps clustered into 192 representative timesteps (8 clusters x 24h): | ||
| - timestep_mapping: shape (8760,), values 0-191 | ||
| - n_representatives: 192 | ||
| - representative_weights: shape (192,), summing to 8760 | ||
| """ | ||
|
|
||
| timestep_mapping: xr.DataArray | ||
| n_representatives: int | xr.DataArray | ||
| representative_weights: xr.DataArray | ||
| aggregated_data: xr.Dataset | None = None | ||
| cluster_structure: ClusterStructure | None = None | ||
| original_data: xr.Dataset | None = None | ||
|
|
||
| def __post_init__(self): | ||
| """Validate and ensure proper DataArray formatting.""" | ||
| # Ensure timestep_mapping is a DataArray | ||
| if not isinstance(self.timestep_mapping, xr.DataArray): | ||
| self.timestep_mapping = xr.DataArray(self.timestep_mapping, dims=['original_time'], name='timestep_mapping') | ||
| elif self.timestep_mapping.name is None: | ||
| self.timestep_mapping = self.timestep_mapping.rename('timestep_mapping') | ||
|
|
||
| # Ensure representative_weights is a DataArray | ||
| # Can be (cluster, time) for 2D structure or (time,) for flat structure | ||
| if not isinstance(self.representative_weights, xr.DataArray): | ||
| self.representative_weights = xr.DataArray(self.representative_weights, name='representative_weights') | ||
| elif self.representative_weights.name is None: | ||
| self.representative_weights = self.representative_weights.rename('representative_weights') | ||
|
|
||
| def __repr__(self) -> str: | ||
| n_rep = ( | ||
| int(self.n_representatives) | ||
| if isinstance(self.n_representatives, (int, np.integer)) | ||
| else int(self.n_representatives.values) | ||
| ) | ||
| has_structure = self.cluster_structure is not None | ||
| has_data = self.original_data is not None and self.aggregated_data is not None | ||
| return ( | ||
| f'ClusterResult(\n' | ||
| f' {self.n_original_timesteps} original → {n_rep} representative timesteps\n' | ||
| f' weights sum={float(self.representative_weights.sum().values):.0f}\n' | ||
| f' cluster_structure={has_structure}, data={has_data}\n' | ||
| f')' | ||
| ) | ||
|
|
||
| def _create_reference_structure(self) -> tuple[dict, dict[str, xr.DataArray]]: | ||
| """Create reference structure for serialization.""" | ||
| ref = {'__class__': self.__class__.__name__} | ||
| arrays = {} | ||
|
|
||
| # Store DataArrays with references | ||
| arrays[str(self.timestep_mapping.name)] = self.timestep_mapping | ||
| ref['timestep_mapping'] = f':::{self.timestep_mapping.name}' | ||
|
|
||
| arrays[str(self.representative_weights.name)] = self.representative_weights | ||
| ref['representative_weights'] = f':::{self.representative_weights.name}' | ||
|
|
||
| # Store scalar values | ||
| if isinstance(self.n_representatives, xr.DataArray): | ||
| n_rep_name = self.n_representatives.name or 'n_representatives' | ||
| self.n_representatives = self.n_representatives.rename(n_rep_name) | ||
| arrays[n_rep_name] = self.n_representatives | ||
| ref['n_representatives'] = f':::{n_rep_name}' | ||
| else: | ||
| ref['n_representatives'] = int(self.n_representatives) | ||
|
|
||
| # Store nested ClusterStructure if present | ||
| if self.cluster_structure is not None: | ||
| cs_ref, cs_arrays = self.cluster_structure._create_reference_structure() | ||
| ref['cluster_structure'] = cs_ref | ||
| arrays.update(cs_arrays) | ||
|
|
||
| # Skip aggregated_data and original_data - not needed for serialization | ||
|
|
||
| return ref, arrays | ||
|
|
||
| @property | ||
| def n_original_timesteps(self) -> int: | ||
| """Number of original timesteps (before aggregation).""" | ||
| return len(self.timestep_mapping.coords['original_time']) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClusterResult core logic is sound; normalize scalar‑like attributes in validate().
ClusterResult nicely encapsulates:
- Timestep mapping as a named DataArray with
original_timecoords. - A flexible
representative_weightsstructure that can be(cluster,)or include period/scenario dims. - A multi‑dim aware
expand_data()that supports both flat and 2‑D(cluster, time)layouts and respects per‑slice mappings.
In validate(), you already handle scalar/array cases for n_representatives; however, for the cluster‑weight consistency check you use self.cluster_structure.n_clusters directly. If n_clusters ever comes in as an xr.DataArray, weights_n_clusters != n_clusters will yield a DataArray and if ... will error.
Mirroring your pattern from __repr__:
n = self.cluster_structure.n_clusters
n = int(n) if isinstance(n, (int, np.integer)) else int(n.values)before the comparison would make validate() robust across IO / deserialized ClusterStructures.
Also applies to: 382-487, 488-545
🤖 Prompt for AI Agents
In @flixopt/clustering/base.py around lines 247 - 353, The validate() method
compares representative weight counts against self.cluster_structure.n_clusters
but fails if n_clusters is an xr.DataArray; cast n_clusters to a plain int
before the comparison using the same pattern as in __repr__ (i.e., if
isinstance(n, (int, np.integer)) use int(n) else use int(n.values)), then use
that int for the weights_n_clusters check; apply the same normalization wherever
validate() or related checks reference cluster_structure.n_clusters (and the
other places noted) so comparisons never yield an xr.DataArray.
| def heatmap( | ||
| self, | ||
| *, | ||
| select: SelectType | None = None, | ||
| colors: str | list[str] | None = None, | ||
| facet_col: str | None = 'auto', | ||
| animation_frame: str | None = 'auto', | ||
| show: bool | None = None, | ||
| **plotly_kwargs: Any, | ||
| ) -> PlotResult: | ||
| """Plot cluster assignments over time as a heatmap timeline. | ||
| Shows which cluster each timestep belongs to as a horizontal color bar. | ||
| The x-axis is time, color indicates cluster assignment. This visualization | ||
| aligns with time series data, making it easy to correlate cluster | ||
| assignments with other plots. | ||
| For multi-period/scenario data, uses faceting and/or animation. | ||
| Args: | ||
| select: xarray-style selection dict, e.g. {'scenario': 'Base Case'}. | ||
| colors: Colorscale name (str) or list of colors for heatmap coloring. | ||
| Dicts are not supported for heatmaps. | ||
| Defaults to CONFIG.Plotting.default_sequential_colorscale. | ||
| facet_col: Dimension to facet on columns. 'auto' uses CONFIG priority. | ||
| animation_frame: Dimension for animation slider. 'auto' uses CONFIG priority. | ||
| show: Whether to display the figure. | ||
| Defaults to CONFIG.Plotting.default_show. | ||
| **plotly_kwargs: Additional arguments passed to plotly. | ||
| Returns: | ||
| PlotResult containing the heatmap figure and cluster assignment data. | ||
| The data has 'cluster' variable with time dimension, matching original timesteps. | ||
| """ | ||
| import pandas as pd | ||
|
|
||
| from ..config import CONFIG | ||
| from ..plot_result import PlotResult | ||
| from ..statistics_accessor import _apply_selection | ||
|
|
||
| result = self._clustering.result | ||
| cs = result.cluster_structure | ||
| if cs is None: | ||
| raise ValueError('No cluster structure available') | ||
|
|
||
| cluster_order_da = cs.cluster_order | ||
| timesteps_per_period = cs.timesteps_per_cluster | ||
| original_time = result.original_data.coords['time'] if result.original_data is not None else None | ||
|
|
||
| # Apply selection if provided | ||
| if select: | ||
| cluster_order_da = _apply_selection(cluster_order_da.to_dataset(name='cluster'), select)['cluster'] | ||
|
|
||
| # Check for multi-dimensional data | ||
| has_periods = 'period' in cluster_order_da.dims | ||
| has_scenarios = 'scenario' in cluster_order_da.dims | ||
|
|
||
| # Get dimension values | ||
| periods = list(cluster_order_da.coords['period'].values) if has_periods else [None] | ||
| scenarios = list(cluster_order_da.coords['scenario'].values) if has_scenarios else [None] | ||
|
|
||
| # Build cluster assignment per timestep for each (period, scenario) slice | ||
| cluster_slices: dict[tuple, xr.DataArray] = {} | ||
| for p in periods: | ||
| for s in scenarios: | ||
| cluster_order = cs.get_cluster_order_for_slice(period=p, scenario=s) | ||
| # Expand: each cluster repeated timesteps_per_period times | ||
| cluster_per_timestep = np.repeat(cluster_order, timesteps_per_period) | ||
| cluster_slices[(p, s)] = xr.DataArray( | ||
| cluster_per_timestep, | ||
| dims=['time'], | ||
| coords={'time': original_time} if original_time is not None else None, | ||
| ) | ||
|
|
||
| # Combine slices into multi-dimensional DataArray | ||
| if has_periods and has_scenarios: | ||
| period_arrays = [] | ||
| for p in periods: | ||
| scenario_arrays = [cluster_slices[(p, s)] for s in scenarios] | ||
| period_arrays.append(xr.concat(scenario_arrays, dim=pd.Index(scenarios, name='scenario'))) | ||
| cluster_da = xr.concat(period_arrays, dim=pd.Index(periods, name='period')) | ||
| elif has_periods: | ||
| cluster_da = xr.concat( | ||
| [cluster_slices[(p, None)] for p in periods], | ||
| dim=pd.Index(periods, name='period'), | ||
| ) | ||
| elif has_scenarios: | ||
| cluster_da = xr.concat( | ||
| [cluster_slices[(None, s)] for s in scenarios], | ||
| dim=pd.Index(scenarios, name='scenario'), | ||
| ) | ||
| else: | ||
| cluster_da = cluster_slices[(None, None)] | ||
|
|
||
| # Add dummy y dimension for heatmap visualization (single row) | ||
| heatmap_da = cluster_da.expand_dims('y', axis=-1) | ||
| heatmap_da = heatmap_da.assign_coords(y=['Cluster']) | ||
| heatmap_da.name = 'cluster_assignment' | ||
|
|
||
| # Reorder dims so 'time' and 'y' are first (heatmap x/y axes) | ||
| # Other dims (period, scenario) will be used for faceting/animation | ||
| target_order = ['time', 'y'] + [d for d in heatmap_da.dims if d not in ('time', 'y')] | ||
| heatmap_da = heatmap_da.transpose(*target_order) | ||
|
|
||
| # Use fxplot.heatmap for smart defaults | ||
| fig = heatmap_da.fxplot.heatmap( | ||
| colors=colors, | ||
| title='Cluster Assignments', | ||
| facet_col=facet_col, | ||
| animation_frame=animation_frame, | ||
| aspect='auto', | ||
| **plotly_kwargs, | ||
| ) | ||
|
|
||
| # Clean up: hide y-axis since it's just a single row | ||
| fig.update_yaxes(showticklabels=False) | ||
| fig.for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1])) | ||
|
|
||
| # Data is exactly what we plotted (without dummy y dimension) | ||
| cluster_da.name = 'cluster' | ||
| data = xr.Dataset({'cluster': cluster_da}) | ||
| plot_result = PlotResult(data=data, figure=fig) | ||
|
|
||
| if show is None: | ||
| show = CONFIG.Plotting.default_show | ||
| if show: | ||
| plot_result.show() | ||
|
|
||
| return plot_result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's find the cluster() method and understand how timesteps_per_cluster is used
cd /repo && fd -t f "base.py" flixopt/clustering/Repository: flixOpt/flixopt
Length of output: 114
🏁 Script executed:
# Search for the cluster() method definition
rg -n "def cluster\(" flixopt/clustering/ -A 20Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Search for timesteps_per_cluster definition
rg -n "timesteps_per_cluster" flixopt/ -B 2 -A 2Repository: flixOpt/flixopt
Length of output: 22568
🏁 Script executed:
# Search for cluster_structure and how it's built
rg -n "class.*ClusterStructure\|@property.*cluster_structure" flixopt/Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
rg -n "def get_cluster_order_for_slice" flixopt/Repository: flixOpt/flixopt
Length of output: 202
🏁 Script executed:
rg -n "cluster_order" flixopt/clustering/base.py -B 2 -A 2 | head -100Repository: flixOpt/flixopt
Length of output: 4504
🏁 Script executed:
# Look at the cluster_structure property definition
rg -n "@property|def cluster_structure" flixopt/clustering/base.py -A 10Repository: flixOpt/flixopt
Length of output: 4920
🏁 Script executed:
# Let me examine the heatmap code more carefully to see how original_time is obtained
rg -n "original_data" flixopt/clustering/base.py -B 2 -A 2 | head -50Repository: flixOpt/flixopt
Length of output: 2052
🏁 Script executed:
# Look at the full get_cluster_order_for_slice implementation
rg -n "def get_cluster_order_for_slice" flixopt/clustering/base.py -A 15Repository: flixOpt/flixopt
Length of output: 850
🏁 Script executed:
# Check create_cluster_structure_from_mapping to understand how cluster_order is created
rg -n "def create_cluster_structure_from_mapping" flixopt/clustering/base.py -A 40Repository: flixOpt/flixopt
Length of output: 1877
🏁 Script executed:
# Look for tests related to partial periods or heatmap
fd -t f "test.*clustering\|clustering.*test" flixopt/ | head -10Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Search for any guards in heatmap or similar methods
rg -n "partial\|trailing\|remainder" flixopt/clustering/base.py -B 2 -A 2Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check if original_data is ever constructed with truncated time
rg -n "original_data.*=" flixopt/ | head -20Repository: flixOpt/flixopt
Length of output: 198
🏁 Script executed:
# Look at the clustering result creation - where original_data is set
rg -n "ClusterResult\|original_data=" flixopt/transform_accessor.py -B 3 -A 3 | head -80Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check if there's any truncation happening when building the clustering result
rg -n "n_original_clusters\|n_original_timesteps" flixopt/transform_accessor.py -B 2 -A 2Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Look at how the clustering result is actually built with original_data
rg -n "result = ClusterResult\|return ClusterResult" flixopt/transform_accessor.py -B 10 -A 5 | head -100Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Search more broadly for where ClusterResult is created
rg -n "ClusterResult" flixopt/transform_accessor.py -A 15Repository: flixOpt/flixopt
Length of output: 1419
🏁 Script executed:
# Look at the cluster() method to see its full implementation
rg -n "def cluster" flixopt/transform_accessor.py -A 50 | head -100Repository: flixOpt/flixopt
Length of output: 3376
🏁 Script executed:
# Look at the complete heatmap implementation more carefully, especially the data handling
sed -n '717,845p' flixopt/clustering/base.pyRepository: flixOpt/flixopt
Length of output: 5702
🏁 Script executed:
# Check what original_time contains - does it include partial periods?
rg -n "original_time\|\.coords\['time'\]" flixopt/clustering/base.py | head -20Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Look for the timestep_mapping to understand how many original timesteps there are
rg -n "timestep_mapping" flixopt/clustering/base.py -B 2 -A 2 | head -40Repository: flixOpt/flixopt
Length of output: 1928
🏁 Script executed:
# Check if there's any handling of partial periods in the clustering logic
rg -n "n_original_clusters\|//\|%" flixopt/clustering/base.py | grep -A 2 -B 2 "1155"Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Look at how original_data is created and whether it includes all original timesteps
rg -n "original_data" flixopt/transform_accessor.py -B 3 -A 3 | head -60Repository: flixOpt/flixopt
Length of output: 329
🏁 Script executed:
# Check the exact line numbers and logic around timesteps truncation
sed -n '1040,1060p' flixopt/transform_accessor.pyRepository: flixOpt/flixopt
Length of output: 832
🏁 Script executed:
# Let's verify if original_data truly contains all original timesteps
rg -n "original_data = \|original_data=" flixopt/transform_accessor.py -B 5 -A 5 | head -40Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check if ds (used for original_data) is the full original dataset or truncated
rg -n "ds = self._fs\.to_dataset\|\.to_dataset" flixopt/transform_accessor.py | head -10Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Now let's trace through: what is ds when it's set as original_data?
sed -n '630,660p' flixopt/transform_accessor.pyRepository: flixOpt/flixopt
Length of output: 1610
🏁 Script executed:
# Search for where ds is defined at the beginning of cluster method
sed -n '578,750p' flixopt/transform_accessor.py | grep -n "ds = \|ds_new\|original_data"Repository: flixOpt/flixopt
Length of output: 246
🏁 Script executed:
# Let me get more context around where ds is created
sed -n '900,1055p' flixopt/transform_accessor.py | head -80Repository: flixOpt/flixopt
Length of output: 4459
🏁 Script executed:
# Let me trace through the exact issue: check if cluster_order length matches what np.repeat produces
# First, let me understand how cluster_order is built for a simple case
sed -n '1155,1165p' flixopt/clustering/base.pyRepository: flixOpt/flixopt
Length of output: 596
🏁 Script executed:
# Now check the heatmap logic step by step - how many elements does cluster_per_timestep have?
# and how many does original_time have?
cat > /tmp/test_heatmap_logic.py << 'EOF'
import numpy as np
# Scenario: 365 days with 24h clusters
n_original_timesteps = 365 * 24 # 8760 hourly timesteps
timesteps_per_cluster = 24
n_original_clusters_full = n_original_timesteps / timesteps_per_cluster # 365 (exact)
# Now what happens with integer division (as in create_cluster_structure_from_mapping)
n_original_clusters = n_original_timesteps // timesteps_per_cluster # 365 (still exact for this case)
# np.repeat on cluster_order
cluster_order_length = n_original_clusters # 365 periods
cluster_per_timestep = np.repeat(np.arange(cluster_order_length), timesteps_per_cluster)
print(f"cluster_per_timestep length: {len(cluster_per_timestep)}")
print(f"original timesteps: {n_original_timesteps}")
print(f"Match: {len(cluster_per_timestep) == n_original_timesteps}")
# Now test with partial period (365.5 days)
print("\n--- Test with partial period ---")
n_original_timesteps_partial = 365 * 24 + 12 # 8772 (extra 12 hours)
n_original_clusters_partial = n_original_timesteps_partial // timesteps_per_cluster # 365 (truncates 12h)
cluster_per_timestep_partial = np.repeat(np.arange(n_original_clusters_partial), timesteps_per_cluster)
print(f"cluster_per_timestep length: {len(cluster_per_timestep_partial)}")
print(f"original timesteps: {n_original_timesteps_partial}")
print(f"Match: {len(cluster_per_timestep_partial) == n_original_timesteps_partial}")
print(f"Mismatch: {n_original_timesteps_partial - len(cluster_per_timestep_partial)} timesteps")
EOF
python /tmp/test_heatmap_logic.pyRepository: flixOpt/flixopt
Length of output: 145
🏁 Script executed:
# Let me verify the issue more directly by examining the logic
# Check if original_data retains full original timesteps
rg -n "original_data=ds" flixopt/transform_accessor.py -B 10 | tail -20Repository: flixOpt/flixopt
Length of output: 672
🏁 Script executed:
# Confirm: original_data is set to ds (the full original dataset)
# And cluster_order comes from create_cluster_structure_from_mapping which uses integer division
# Let me verify the exact mismatch scenario by looking at the code path
sed -n '1154,1165p' flixopt/clustering/base.pyRepository: flixOpt/flixopt
Length of output: 635
🏁 Script executed:
# The key question: does the heatmap attempt to assign mismatched coordinates?
# Look at the exact line in heatmap where coords are assigned
sed -n '785,795p' flixopt/clustering/base.pyRepository: flixOpt/flixopt
Length of output: 539
🏁 Script executed:
# Let me trace through to confirm original_time includes all original timesteps
# Check if FlowSystem.to_dataset includes all original timesteps
rg -n "def to_dataset" flixopt/flow_system.py -A 10 | head -20Repository: flixOpt/flixopt
Length of output: 667
🏁 Script executed:
# Check if there are any guards in xarray coordinate assignment
# Let's look at whether xarray will raise an error on mismatch
cat > /tmp/test_xarray_coords.py << 'EOF'
import numpy as np
import xarray as xr
# Test: mismatched array and coord lengths
arr = np.array([0, 1, 2, 3])
coords = np.array([0, 1, 2, 3, 4]) # 5 elements instead of 4
try:
da = xr.DataArray(arr, dims=['time'], coords={'time': coords})
print("Success - no error raised")
except Exception as e:
print(f"Error raised: {type(e).__name__}: {e}")
EOF
python /tmp/test_xarray_coords.pyRepository: flixOpt/flixopt
Length of output: 218
🏁 Script executed:
# Let me verify the actual behavior and understand the complete picture
# by looking at how timesteps_per_cluster vs timesteps_per_period are used
rg -n "timesteps_per_period\|timesteps_per_cluster" flixopt/clustering/base.py | head -15Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Let me check the exact dimensions and what n_original_clusters actually is
# by looking at how the cluster_order is built
rg -n "n_original_clusters" flixopt/clustering/base.py -B 2 -A 2Repository: flixOpt/flixopt
Length of output: 3253
🏁 Script executed:
# Check the cluster() method to see if it enforces exact multiples
sed -n '650,750p' flixopt/transform_accessor.py | grep -n "n_original\|timesteps_per_cluster\|remainder\|partial"Repository: flixOpt/flixopt
Length of output: 128
🏁 Script executed:
# Look for how cluster_duration is applied and if there's validation
sed -n '690,710p' flixopt/transform_accessor.pyRepository: flixOpt/flixopt
Length of output: 1037
🏁 Script executed:
# Check if the clustering creates cluster_order based on exact multiples
rg -n "cluster_order_slices\|cluster_orders" flixopt/transform_accessor.py -B 3 -A 3 | head -50Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Let me check if there's any truncation happening in the clustering result
# to see if original_data is ever truncated to match the cluster structure
rg -n "original_data\|original_timesteps" flixopt/transform_accessor.py -B 2 -A 2 | head -40Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check if there's any preprocessing of original_data before it's stored
sed -n '1030,1060p' flixopt/transform_accessor.pyRepository: flixOpt/flixopt
Length of output: 1411
🏁 Script executed:
# Look for whether cluster_order is built to match all original timesteps or just full periods
rg -n "cluster_orders\|cluster_order_slices" flixopt/transform_accessor.py -B 5 -A 5 | head -60Repository: flixOpt/flixopt
Length of output: 41
Add explicit validation for timestep-to-cluster alignment in the heatmap method.
The heatmap() implementation expands cluster assignments by repeating each cluster's order across timesteps_per_cluster timesteps. This assumes len(original_time) == len(cluster_order) * timesteps_per_cluster.
The cluster structure is built with n_original_clusters = n_original // timesteps_per_cluster (integer division), which silently truncates any partial period. If the original data has timesteps that are not an exact multiple of timesteps_per_cluster, xarray will raise a coordinate mismatch error when attempting to assign original_time coordinates to the expanded array—the error message will not clearly indicate the root cause.
While cluster() validates that cluster_duration is a multiple of the timestep size, it does not guarantee that the total number of original timesteps is an exact multiple. For example, 365.5 days with 24-hour clusters passes validation but creates a mismatch.
Add a clear precondition check before the repeat and coordinate assignment:
if len(original_time) != len(cluster_order) * timesteps_per_period:
raise ValueError(
f'Timestep count ({len(original_time)}) is not a multiple of '
f'timesteps_per_cluster ({timesteps_per_period}). '
f'Expected {len(cluster_order) * timesteps_per_period} timesteps '
f'({len(cluster_order)} periods × {timesteps_per_period} timesteps/period). '
f'Ensure your clustering horizon covers a complete number of periods.'
)🤖 Prompt for AI Agents
In @flixopt/clustering/base.py around lines 717 - 845, In heatmap(), before
repeating cluster_order into cluster_per_timestep, validate that
len(original_time) equals len(cluster_order) * timesteps_per_period and raise a
clear ValueError if not; specifically, locate the loop that calls
cs.get_cluster_order_for_slice and before cluster_per_timestep = np.repeat(...)
add the precondition check comparing len(original_time) to len(cluster_order) *
timesteps_per_period (using the symbols original_time, cluster_order,
timesteps_per_period) and raise an informative error describing the mismatch and
expected counts so the caller knows the clustering horizon must cover whole
periods.
| def clusters( | ||
| self, | ||
| variables: str | list[str] | None = None, | ||
| *, | ||
| select: SelectType | None = None, | ||
| colors: ColorType | None = None, | ||
| color: str | None = 'auto', | ||
| facet_col: str | None = 'cluster', | ||
| facet_cols: int | None = None, | ||
| show: bool | None = None, | ||
| **plotly_kwargs: Any, | ||
| ) -> PlotResult: | ||
| """Plot each cluster's typical period profile. | ||
| Shows each cluster as a separate faceted subplot with all variables | ||
| colored differently. Useful for understanding what each cluster represents. | ||
| Args: | ||
| variables: Variable(s) to plot. Can be a string, list of strings, | ||
| or None to plot all time-varying variables. | ||
| select: xarray-style selection dict, e.g. {'scenario': 'Base Case'}. | ||
| colors: Color specification (colorscale name, color list, or label-to-color dict). | ||
| color: Dimension for line colors. 'auto' uses CONFIG priority (typically 'variable'). | ||
| Use 'cluster' to color by cluster instead of faceting. | ||
| facet_col: Dimension for subplot columns. Defaults to 'cluster'. | ||
| Use 'variable' to facet by variable instead. | ||
| facet_cols: Max columns before wrapping facets. | ||
| Defaults to CONFIG.Plotting.default_facet_cols. | ||
| show: Whether to display the figure. | ||
| Defaults to CONFIG.Plotting.default_show. | ||
| **plotly_kwargs: Additional arguments passed to plotly. | ||
| Returns: | ||
| PlotResult containing the figure and underlying data. | ||
| """ | ||
| from ..config import CONFIG | ||
| from ..plot_result import PlotResult | ||
| from ..statistics_accessor import _apply_selection | ||
|
|
||
| result = self._clustering.result | ||
| cs = result.cluster_structure | ||
| if result.aggregated_data is None or cs is None: | ||
| raise ValueError('No aggregated data or cluster structure available') | ||
|
|
||
| # Apply selection to aggregated data | ||
| aggregated_data = _apply_selection(result.aggregated_data, select) | ||
|
|
||
| time_vars = self._get_time_varying_variables() | ||
| if not time_vars: | ||
| raise ValueError('No time-varying variables found') | ||
|
|
||
| # Resolve variables | ||
| resolved_variables = self._resolve_variables(variables) | ||
|
|
||
| n_clusters = int(cs.n_clusters) if isinstance(cs.n_clusters, (int, np.integer)) else int(cs.n_clusters.values) | ||
| timesteps_per_cluster = cs.timesteps_per_cluster | ||
|
|
||
| # Check dimensions of all variables for consistency | ||
| has_cluster_dim = None | ||
| for var in resolved_variables: | ||
| da = aggregated_data[var] | ||
| var_has_cluster = 'cluster' in da.dims | ||
| extra_dims = [d for d in da.dims if d not in ('time', 'cluster')] | ||
| if extra_dims: | ||
| raise ValueError( | ||
| f'clusters() requires data with only time (or cluster, time) dimensions. ' | ||
| f'Variable {var!r} has extra dimensions: {extra_dims}. ' | ||
| f'Use select={{{extra_dims[0]!r}: <value>}} to select a specific {extra_dims[0]}.' | ||
| ) | ||
| if has_cluster_dim is None: | ||
| has_cluster_dim = var_has_cluster | ||
| elif has_cluster_dim != var_has_cluster: | ||
| raise ValueError( | ||
| f'All variables must have consistent dimensions. ' | ||
| f'Variable {var!r} has {"" if var_has_cluster else "no "}cluster dimension, ' | ||
| f'but previous variables {"do" if has_cluster_dim else "do not"}.' | ||
| ) | ||
|
|
||
| # Build Dataset with cluster dimension, using labels with occurrence counts | ||
| # Check if cluster_occurrences has extra dims | ||
| occ_extra_dims = [d for d in cs.cluster_occurrences.dims if d not in ('cluster',)] | ||
| if occ_extra_dims: | ||
| # Use simple labels without occurrence counts for multi-dim case | ||
| cluster_labels = [f'Cluster {c}' for c in range(n_clusters)] | ||
| else: | ||
| cluster_labels = [ | ||
| f'Cluster {c} (×{int(cs.cluster_occurrences.sel(cluster=c).values)})' for c in range(n_clusters) | ||
| ] | ||
|
|
||
| data_vars = {} | ||
| for var in resolved_variables: | ||
| da = aggregated_data[var] | ||
| if has_cluster_dim: | ||
| # Data already has (cluster, time) dims - just update cluster labels | ||
| data_by_cluster = da.values | ||
| else: | ||
| # Data has (time,) dim - reshape to (cluster, time) | ||
| data_by_cluster = da.values.reshape(n_clusters, timesteps_per_cluster) | ||
| data_vars[var] = xr.DataArray( | ||
| data_by_cluster, | ||
| dims=['cluster', 'time'], | ||
| coords={'cluster': cluster_labels, 'time': range(timesteps_per_cluster)}, | ||
| ) | ||
|
|
||
| ds = xr.Dataset(data_vars) | ||
| title = 'Clusters' if len(resolved_variables) > 1 else f'Clusters: {resolved_variables[0]}' | ||
|
|
||
| # Use fxplot for smart defaults | ||
| fig = ds.fxplot.line( | ||
| colors=colors, | ||
| color=color, | ||
| title=title, | ||
| facet_col=facet_col, | ||
| facet_cols=facet_cols, | ||
| **plotly_kwargs, | ||
| ) | ||
| fig.update_yaxes(matches=None) | ||
| fig.for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1])) | ||
|
|
||
| # Include occurrences in result data | ||
| data_vars['occurrences'] = cs.cluster_occurrences | ||
| result_data = xr.Dataset(data_vars) | ||
| plot_result = PlotResult(data=result_data, figure=fig) | ||
|
|
||
| if show is None: | ||
| show = CONFIG.Plotting.default_show | ||
| if show: | ||
| plot_result.show() | ||
|
|
||
| return plot_result | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clusters() and create_cluster_structure_from_mapping() assumptions about cluster IDs.
Two related points about cluster ID handling:
-
ClusteringPlotAccessor.clusters()reshape assumptionsIn the branch where data lack a
clusterdim, you reshape flattimedata as:data_by_cluster = da.values.reshape(n_clusters, timesteps_per_cluster)
This relies on
len(time) == n_clusters * timesteps_per_cluster. If a backend passes 1‑D aggregated data with any other layout (or if parameters are inconsistent), this raises a bareValueErrorfrom NumPy. Adding an explicit size check with a tailored error message would make debugging far easier. -
create_cluster_structure_from_mapping()and non‑contiguous IDsHere you derive:
unique_clusters = np.unique(cluster_order) n_clusters = int(unique_clusters.max()) + 1 if len(unique_clusters) > 0 else 0 cluster_occurrences_da = xr.DataArray( [occurrences.get(c, 0) for c in range(n_clusters)], dims=['cluster'], )
This means:
n_clustersismax_id + 1, not the count of distinct cluster IDs, contradicting the docstring “Number of distinct clusters”.- If
cluster_orderis something like[0, 2, 5], you’ll reportn_clusters=6and create zero‑occurrence entries for IDs 1, 3, 4.
That might be acceptable if you want a dense ID range, but it’s worth either:
- Documenting that cluster IDs must be contiguous starting at 0 (and validating this explicitly), or
- Switching to a sparse representation where
len(unique_clusters)isn_clustersand theclustercoordinate isunique_clusters.
Given this function is a public convenience API, tightening these semantics now will prevent subtle inconsistencies later.
Also applies to: 1138-1185
🤖 Prompt for AI Agents
In @flixopt/clustering/base.py around lines 847 - 977, clusters(): add an
explicit size check before reshaping in ClusteringPlotAccessor.clusters() —
verify da.values.size == n_clusters * timesteps_per_cluster and raise a clear
ValueError that states the expected and actual sizes (include variable name and
current n_clusters/timesteps_per_cluster) instead of relying on numpy.reshape to
raise. create_cluster_structure_from_mapping(): stop deriving n_clusters as
max_id+1; compute unique_clusters = np.unique(cluster_order), set n_clusters =
len(unique_clusters), and build cluster_occurrences as a DataArray over those
unique cluster IDs (use unique_clusters as the 'cluster' coordinate) so
sparse/non‑contiguous IDs are preserved (or alternatively add explicit
validation that IDs must be contiguous starting at 0 and raise if not) — update
usages of cs.cluster_occurrences and cs.n_clusters accordingly.
| # Build Clustering for inter-cluster linking and solution expansion | ||
| n_original_timesteps = len(self._fs.timesteps) | ||
|
|
||
| # Build per-slice cluster_order and timestep_mapping as multi-dimensional DataArrays | ||
| # This is needed because each (period, scenario) combination may have different clustering | ||
|
|
||
| def _build_timestep_mapping_for_key(key: tuple) -> np.ndarray: | ||
| """Build timestep_mapping for a single (period, scenario) slice.""" | ||
| mapping = np.zeros(n_original_timesteps, dtype=np.int32) | ||
| for period_idx, cluster_id in enumerate(cluster_orders[key]): | ||
| for pos in range(timesteps_per_cluster): | ||
| original_idx = period_idx * timesteps_per_cluster + pos | ||
| if original_idx < n_original_timesteps: | ||
| representative_idx = cluster_id * timesteps_per_cluster + pos | ||
| mapping[original_idx] = representative_idx | ||
| return mapping | ||
|
|
||
| def _build_cluster_occurrences_for_key(key: tuple) -> np.ndarray: | ||
| """Build cluster_occurrences array for a single (period, scenario) slice.""" | ||
| occurrences = cluster_occurrences_all[key] | ||
| return np.array([occurrences.get(c, 0) for c in range(actual_n_clusters)]) | ||
|
|
||
| # Build multi-dimensional arrays | ||
| if has_periods or has_scenarios: | ||
| # Multi-dimensional case: build arrays for each (period, scenario) combination | ||
| # cluster_order: dims [original_cluster, period?, scenario?] | ||
| cluster_order_slices = {} | ||
| timestep_mapping_slices = {} | ||
| cluster_occurrences_slices = {} | ||
|
|
||
| # Use renamed timesteps as coordinates for multi-dimensional case | ||
| original_timesteps_coord = self._fs.timesteps.rename('original_time') | ||
|
|
||
| for p in periods: | ||
| for s in scenarios: | ||
| key = (p, s) | ||
| cluster_order_slices[key] = xr.DataArray( | ||
| cluster_orders[key], dims=['original_cluster'], name='cluster_order' | ||
| ) | ||
| timestep_mapping_slices[key] = xr.DataArray( | ||
| _build_timestep_mapping_for_key(key), | ||
| dims=['original_time'], | ||
| coords={'original_time': original_timesteps_coord}, | ||
| name='timestep_mapping', | ||
| ) | ||
| cluster_occurrences_slices[key] = xr.DataArray( | ||
| _build_cluster_occurrences_for_key(key), dims=['cluster'], name='cluster_occurrences' | ||
| ) | ||
|
|
||
| # Combine slices into multi-dimensional DataArrays | ||
| cluster_order_da = self._combine_slices_to_dataarray_generic( | ||
| cluster_order_slices, ['original_cluster'], periods, scenarios, 'cluster_order' | ||
| ) | ||
| timestep_mapping_da = self._combine_slices_to_dataarray_generic( | ||
| timestep_mapping_slices, ['original_time'], periods, scenarios, 'timestep_mapping' | ||
| ) | ||
| cluster_occurrences_da = self._combine_slices_to_dataarray_generic( | ||
| cluster_occurrences_slices, ['cluster'], periods, scenarios, 'cluster_occurrences' | ||
| ) | ||
| else: | ||
| # Simple case: single (None, None) slice | ||
| cluster_order_da = xr.DataArray(cluster_orders[first_key], dims=['original_cluster'], name='cluster_order') | ||
| # Use renamed timesteps as coordinates | ||
| original_timesteps_coord = self._fs.timesteps.rename('original_time') | ||
| timestep_mapping_da = xr.DataArray( | ||
| _build_timestep_mapping_for_key(first_key), | ||
| dims=['original_time'], | ||
| coords={'original_time': original_timesteps_coord}, | ||
| name='timestep_mapping', | ||
| ) | ||
| cluster_occurrences_da = xr.DataArray( | ||
| _build_cluster_occurrences_for_key(first_key), dims=['cluster'], name='cluster_occurrences' | ||
| ) | ||
|
|
||
| cluster_structure = ClusterStructure( | ||
| cluster_order=cluster_order_da, | ||
| cluster_occurrences=cluster_occurrences_da, | ||
| n_clusters=actual_n_clusters, | ||
| timesteps_per_cluster=timesteps_per_cluster, | ||
| ) | ||
|
|
||
| # Create representative_weights with (cluster,) dimension only | ||
| # Each cluster has one weight (same for all timesteps within it) | ||
| def _build_cluster_weights_for_key(key: tuple) -> xr.DataArray: | ||
| occurrences = cluster_occurrences_all[key] | ||
| # Shape: (n_clusters,) - one weight per cluster | ||
| weights = np.array([occurrences.get(c, 1) for c in range(actual_n_clusters)]) | ||
| return xr.DataArray(weights, dims=['cluster'], name='representative_weights') | ||
|
|
||
| weights_slices = {key: _build_cluster_weights_for_key(key) for key in cluster_occurrences_all} | ||
| representative_weights = self._combine_slices_to_dataarray_generic( | ||
| weights_slices, ['cluster'], periods, scenarios, 'representative_weights' | ||
| ) | ||
|
|
||
| aggregation_result = ClusterResult( | ||
| timestep_mapping=timestep_mapping_da, | ||
| n_representatives=n_reduced_timesteps, | ||
| representative_weights=representative_weights, | ||
| cluster_structure=cluster_structure, | ||
| original_data=ds, | ||
| aggregated_data=ds_new, | ||
| ) | ||
|
|
||
| reduced_fs.clustering = Clustering( | ||
| result=aggregation_result, | ||
| backend_name='tsam', | ||
| metrics=clustering_metrics, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClusterResult construction and multi‑dim mapping: minor robustness gaps.
The construction of ClusterStructure and ClusterResult from per‑slice cluster_orders / occurrences and mappings is nicely thought‑out and multi‑dim aware. Two small robustness points:
- In
_build_cluster_occurrences_for_key, you implicitly assume cluster IDs are0..actual_n_clusters-1. If a backend ever emits non‑contiguous IDs, occurrences will be silently zero for the missing ones. Either enforce contiguity with a clearValueError, or deriveactual_n_clustersfromnp.uniqueand set the cluster coordinate accordingly. ClusterResult.validate()comparesweights_n_clustersagainstself.cluster_structure.n_clusterswithout normalizing the latter to anint. Ifn_clustersever comes back as anxr.DataArray(e.g., from IO deserialization),weights_n_clusters != n_clusterswill produce a DataArray andif ...will raise an “ambiguous truth value” error. Coercing once toint(...)(as you already do in__repr__) would makevalidate()future‑proof.
These don’t affect the current cluster() → expand_solution() path but would help if other backends/IO paths start populating these structures.
🤖 Prompt for AI Agents
In @flixopt/transform_accessor.py around lines 952 - 1059, The per-slice
occurrence builder _build_cluster_occurrences_for_key and
ClusterStructure/ClusterResult creation should be made robust to non‑contiguous
cluster IDs: either validate contiguity in _build_cluster_occurrences_for_key by
checking cluster_occurrences_all[key].keys() against range(actual_n_clusters)
and raise a ValueError with a clear message, or compute actual_n_clusters from
np.unique over the emitted cluster IDs and build cluster coordinate/occurrence
arrays from that derived value (adjusting
cluster_order_da/cluster_occurrences_da and representative_weights accordingly);
additionally, make ClusterResult.validate() safe against xr.DataArray n_clusters
by coercing self.cluster_structure.n_clusters to int(...) before comparing with
weights_n_clusters so the comparison yields a plain int and avoids ambiguous
truth value errors.
Summary
Major release preparation for v6.0.0 with complete clustering/aggregation rework.
New Features
Improvements
Breaking Changes
See CHANGELOG.md for full list of breaking changes in v6.0.0.
Test Plan
Summary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.