-
Notifications
You must be signed in to change notification settings - Fork 9
Clustering Parameter Refinements & Unified Slot Assignment #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
New Parameters Added to cluster() Method
| Parameter | Type | Default | Purpose |
|-------------------------|-------------------------------|----------------------|--------------------------------------------------------------------------------------------------------------------|
| cluster_method | Literal[...] | 'k_means' | Clustering algorithm ('k_means', 'hierarchical', 'k_medoids', 'k_maxoids', 'averaging') |
| representation_method | Literal[...] | 'meanRepresentation' | How clusters are represented ('meanRepresentation', 'medoidRepresentation', 'distributionAndMinMaxRepresentation') |
| extreme_period_method | Literal[...] | 'new_cluster_center' | How peaks are integrated ('None', 'append', 'new_cluster_center', 'replace_cluster_center') |
| rescale_cluster_periods | bool | True | Rescale clusters to match original means |
| random_state | int | None | None | Random seed for reproducibility |
| predef_cluster_order | np.ndarray | list[int] | None | None | Manual clustering assignments |
| **tsam_kwargs | Any | - | Pass-through for any tsam parameter |
Clustering Quality Metrics
Access via fs.clustering.metrics after clustering - returns a DataFrame with RMSE, MAE, and other accuracy indicators per time series.
Files Modified
1. flixopt/transform_accessor.py - Updated cluster() signature and tsam call
2. flixopt/clustering/base.py - Added metrics field to Clustering class
3. tests/test_clustering/test_integration.py - Added tests for new parameters
4. docs/user-guide/optimization/clustering.md - Updated documentation
Property renamed: n_original_periods → n_original_clusters
…imestep that regular FlowSystems have.
Root Cause: In expand_solution(), the solution was only indexed by original_timesteps (n elements) instead of original_timesteps_extra (n+1 elements).
Fix in flixopt/transform_accessor.py:
1. Reindex solution to timesteps_extra (line 1296-1298):
- Added expanded_fs._solution.reindex(time=original_timesteps_extra) for consistency with non-expanded FlowSystems
2. Fill extra timestep for charge_state (lines 1300-1333):
- Added special handling to properly fill the extra timestep for storage charge_state variables using the last cluster's extra timestep value
3. Updated intercluster storage handling (lines 1340-1388):
- Modified to work with original_timesteps_extra instead of just original_timesteps
- The extra timestep now correctly gets the final SOC boundary value with proper decay applied
Tests updated in tests/test_cluster_reduce_expand.py:
- Updated 4 assertions that check solution time coordinates to expect 193 (192 + 1 extra) instead of 192
📝 WalkthroughWalkthroughThis PR releases version 6.0.0 introducing major clustering enhancements, a new fxplot plotting accessor, refined plotting configuration with color as a first-class dimension, and terminology standardization (original_period → original_cluster). It removes deprecated v5.0 classes, extends the clustering API with new parameters (cluster_method, representation_method, random_state, rescale_cluster_periods, extreme_period_method, predef_cluster_order), adds clustering quality metrics, and updates documentation, notebooks, and tests accordingly. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…sts in the melted DataFrame from data_var names, not as a dimension) - When facet_row='variable' or facet_col='variable' is passed, it's passed through directly - In line(), when faceting by variable, it's not also used for color (avoids double encoding)
…at it consistently:
flixopt/config.py:
'extra_dim_priority': ('variable', 'cluster', 'period', 'scenario'),
'x_dim_priority': ('time', 'duration', 'duration_pct', 'variable', 'period', 'scenario', 'cluster'),
flixopt/dataset_plot_accessor.py:
- _get_x_dim: Now takes n_data_vars parameter; 'variable' is available when > 1
- _resolve_auto_facets: 'variable' is available when len(data_vars) > 1 and respects exclude_dims
Behavior:
- 'variable' is treated like any other dimension in the priority system
- Only available when there are multiple data_vars
- Properly excluded when already used (e.g., for x-axis)
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/notebooks/08c-clustering.ipynb (1)
549-590: Critical: Documentation defaults don't match implementation.The API reference table shows incorrect default values for several parameters:
- Line 556: Documents
cluster_methoddefault as'k_means', but the implementation (transform_accessor.py line 585) defaults to'hierarchical'- Line 557: Documents
representation_methoddefault as'meanRepresentation', but the implementation (line 588) defaults to'medoidRepresentation'- Line 558: Documents
extreme_period_methoddefault as'new_cluster_center', but the implementation (line 589) defaults toNone🔎 Proposed corrections for the API reference table
-| `cluster_method` | `str` | 'k_means' | Algorithm: 'k_means', 'hierarchical', 'k_medoids', 'k_maxoids', 'averaging' | +| `cluster_method` | `str` | 'hierarchical' | Algorithm: 'k_means', 'hierarchical', 'k_medoids', 'k_maxoids', 'averaging' | -| `representation_method` | `str` | 'meanRepresentation' | 'meanRepresentation', 'medoidRepresentation', 'distributionAndMinMaxRepresentation' | +| `representation_method` | `str` | 'medoidRepresentation' | 'meanRepresentation', 'medoidRepresentation', 'distributionAndMinMaxRepresentation' | -| `extreme_period_method` | `str` | 'new_cluster_center' | How peaks are integrated: 'None', 'append', 'new_cluster_center', 'replace_cluster_center' | +| `extreme_period_method` | `str` | None | How peaks are integrated: None, 'append', 'new_cluster_center', 'replace_cluster_center' |flixopt/transform_accessor.py (1)
720-807: Add error handling for clustering metrics computation.The code assumes
tsam_agg.accuracyIndicators()always succeeds and returns a valid DataFrame. If tsam encounters issues computing metrics, this could raise an exception.Consider adding defensive error handling:
try: clustering_metrics_all[key] = tsam_agg.accuracyIndicators() except Exception as e: logger.warning(f'Failed to compute clustering metrics for {key}: {e}') clustering_metrics_all[key] = pd.DataFrame() # Empty DataFrame as fallbackThen handle the case where all metrics are empty when building the Dataset.
🧹 Nitpick comments (2)
flixopt/transform_accessor.py (2)
715-717: Consider passing random_state to tsam instead of setting global state.Setting
np.random.seed()globally can affect other parts of the code that use numpy's random functions, potentially causing hard-to-debug issues.If tsam's
TimeSeriesAggregationaccepts arandom_stateorseedparameter, consider passing it directly rather than modifying global state:# Instead of: if random_state is not None: np.random.seed(random_state) # Consider: if random_state is not None: tsam_kwargs.setdefault('randomSeed', random_state)Verify tsam's API to see if this is supported. If not, the current approach is acceptable but should be documented.
746-760: Document tsam_kwargs precedence to prevent conflicts.Users could accidentally override explicit parameters via
**tsam_kwargs, leading to confusing behavior. For example:cluster(..., cluster_method='hierarchical', **{'clusterMethod': 'k_means'})Consider adding a docstring note or validation:
# Ensure tsam_kwargs doesn't override explicit parameters reserved_keys = {'clusterMethod', 'extremePeriodMethod', 'representationMethod', 'rescaleClusterPeriods', 'predefClusterOrder'} conflicts = reserved_keys & set(tsam_kwargs.keys()) if conflicts: raise ValueError( f'Cannot override explicit parameters via tsam_kwargs: {conflicts}' )Alternatively, just document the precedence in the docstring.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
CHANGELOG.mddocs/notebooks/08c-clustering.ipynbdocs/user-guide/optimization/clustering.mddocs/user-guide/results-plotting.mdflixopt/clustering/base.pyflixopt/clustering/intercluster_helpers.pyflixopt/components.pyflixopt/config.pyflixopt/dataset_plot_accessor.pyflixopt/transform_accessor.pymkdocs.ymltests/test_cluster_reduce_expand.pytests/test_clustering/test_base.pytests/test_clustering/test_integration.py
🧰 Additional context used
🧬 Code graph analysis (2)
flixopt/dataset_plot_accessor.py (2)
flixopt/config.py (2)
CONFIG(185-912)Plotting(551-591)flixopt/color_processing.py (1)
process_colors(165-233)
flixopt/components.py (2)
flixopt/clustering/base.py (3)
n_original_clusters(127-129)n_original_clusters(1051-1055)cluster_order(1029-1033)flixopt/clustering/intercluster_helpers.py (1)
build_boundary_coords(134-178)
🪛 markdownlint-cli2 (0.18.1)
CHANGELOG.md
129-129: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
- GitHub Check: test (3.14)
- GitHub Check: Build documentation
🔇 Additional comments (39)
mkdocs.yml (1)
72-78: LGTM! Well-organized documentation restructuring.The separation of "Resampling" and "Clustering" as distinct concepts is clear, and the new hierarchical structure for clustering documentation (Basics, Storage Modes, Multi-Period, Internals) provides logical progressive coverage.
docs/user-guide/results-plotting.md (1)
5-7: LGTM! Helpful cross-reference added.The tip block effectively guides users to the generalized plotting pathway for arbitrary xarray data, complementing the flixopt-specific plotting documentation.
tests/test_cluster_reduce_expand.py (2)
452-454: LGTM! Terminology updated consistently.The change from
n_original_periodston_original_clustersaligns with the broader refactoring across the codebase to use cluster-based terminology.
170-170: The extra timestep is correctly implemented and intentionally required for inter-cluster storage linking.The 193-timestep structure (192 original + 1 extra) is fundamental to how storage constraints with SOC_boundary work. The
calculate_timestep_duration()method correctly indexes totimesteps_extra[:-1], yielding 192 duration values that correspond to the 192 intervals between 193 time points. Whenflow_hoursmultiplies flow rates by these durations, xarray's broadcasting correctly produces NaN for the final timestep in non-storage variables while storage charge_state properly captures the boundary condition.The expansion logic in
expand_solution()(lines 1324–1382) correctly:
- Expands charge_state with the extra timestep for each cluster
- Combines it with SOC_boundary accounting for self-discharge decay using the formula:
SOC(t) = SOC_boundary[d] * (1-loss)^t + ΔE(t)- Reindexes all solution variables to
timesteps_extrafor consistencyThe test at lines 176–210 confirms that
total_effectsandflow_hoursmatch between clustered and expanded solutions, validating that all downstream statistics and operations handle the extra timestep structure correctly.flixopt/clustering/intercluster_helpers.py (1)
134-167: LGTM! API updated to use consistent cluster terminology.The function signature and implementation have been updated from
n_original_periodston_original_clusters, aligning with the broader terminology standardization across the clustering module. The logic and calculation remain correct. All callers—including the call atflixopt/components.py:1341—have been updated to use the new parameter name, and no references to the old parameter remain in the codebase.tests/test_clustering/test_integration.py (1)
173-269: LGTM! Comprehensive test coverage for advanced clustering options.The new
TestClusterAdvancedOptionstest class provides thorough coverage of the newly added clustering parameters includingcluster_method,random_state,representation_method,rescale_cluster_periods, and tsam kwargs passthrough. The fixture setup is clean and the tests are well-structured.tests/test_clustering/test_base.py (1)
20-20: LGTM! Consistent terminology update.The renaming from
original_periodtooriginal_clusteris applied systematically across all test cases. The dimension names and property assertions are updated consistently, reflecting the broader terminology standardization in the PR.Also applies to: 32-32, 45-45, 50-50, 139-139
CHANGELOG.md (6)
54-59: Excellent release documentation structure.The v6.0.0 header clearly marks this as a major release with a concise summary highlighting the three main features: time-series clustering, fxplot accessor, and removal of deprecated classes. The warning callout for breaking changes with migration guidance is particularly helpful.
75-111: Well-documented advanced clustering configuration.The new clustering parameters are clearly explained with:
- Clean parameter descriptions in the initial table
- Practical code example showing combined usage
- Comprehensive algorithm comparison table
- Guidance on using **kwargs for advanced tsam parameters
This provides excellent user guidance for the extended clustering API.
112-126: Clear documentation of clustering quality metrics.The metrics section effectively demonstrates:
- How to access the metrics xr.Dataset
- Expected structure with dimensions
- Specific metric extraction pattern
This matches the clustering API extensions referenced in the PR objectives.
127-163: Comprehensive fxplot accessor documentation.The FXPlot section provides:
- Multiple practical examples showing the API
- Complete method reference table
- Key features breakdown
- Clear integration points with any xarray Dataset
The code examples are clean and demonstrate the universal nature of the accessor.
Note: The static analysis warning about "fenced" vs "indented" code blocks at line 129 is a false positive—fenced code blocks with syntax highlighting are the recommended Markdown style.
176-209: Thorough deprecation notices for v7.0.0.The deprecation section is well-organized:
- Clear removal version (v7.0.0) at the top
- Categorized by type (Classes, FlowSystem methods, Parameters, Topology methods)
- Includes migration mappings in table format
- Notes about simplified topology method names
This gives users clear advance notice for the next major version.
210-245: Helpful migration guide with side-by-side comparison.The ClusteredOptimization migration section:
- Uses tabs to show old vs. new code clearly
- Includes parameter mapping (nr_of_periods → n_clusters)
- Demonstrates the new two-stage workflow
- Shows how to access results in both approaches
This makes upgrading straightforward for existing users.
docs/user-guide/optimization/clustering.md (3)
55-58: Well-documented clustering parameter additions.The table extension clearly documents the four new parameters:
cluster_method: Algorithm selection with examplesrepresentation_method: Representation strategyrandom_state: Reproducibility controlrescale_cluster_periods: Period rescaling optionThese align with the advanced clustering features mentioned in the PR objectives.
75-111: Comprehensive advanced clustering options guide.The new section provides:
- Practical example combining multiple advanced parameters
- Complete algorithm comparison table (k_means, hierarchical, k_medoids, k_maxoids, averaging)
- Clear guidance on **kwargs for tsam parameters (sameMean, sortValues)
This gives users fine-grained control over clustering behavior with good examples.
112-126: Clear clustering quality metrics documentation.The metrics section demonstrates:
- How to access
fs_clustered.clustering.metrics- Expected xr.Dataset structure
- Specific metric extraction (RMSE example)
This exposes the clustering quality evaluation capabilities added in this PR.
flixopt/config.py (2)
33-33: Correct deprecation version progression.Updating
DEPRECATION_REMOVAL_VERSIONfrom'6.0.0'to'7.0.0'is appropriate—this release is v6.0.0, so deprecations should target the next major version (v7.0.0). This aligns with the extensive deprecation notices in the CHANGELOG.
167-169: Plotting priority updates for fxplot accessor.The configuration changes introduce new dimension priorities:
'variable'added as first priority inextra_dim_priority'color'added as first slot indim_slot_priority'variable'inserted inx_dim_priorityThese changes support the new fxplot accessor feature documented in the CHANGELOG, where color is now a "first-class dimension" and variable is treated as a valid facet/color option. Since this is a major version release (6.0.0), these breaking changes in default behavior are acceptable and well-documented.
flixopt/components.py (3)
1198-1198: Consistent terminology update in InterclusterStorageModel.The renaming from
n_original_periodston_original_clustersis systematically applied throughout the inter-cluster storage linking logic:
- Variable names updated (line 1333)
- Function call parameters updated (line 1341)
- Index selections updated (lines 1370, 1380-1381)
- Comments updated (line 1198)
The changes align with the broader PR terminology standardization and match the updates in related files (clustering/base.py, clustering/intercluster_helpers.py).
Also applies to: 1333-1333, 1341-1341, 1370-1370, 1380-1381
1363-1365: Method signatures correctly updated.The function signatures for
_add_linking_constraintsand_add_combined_bound_constraintshave been updated:
- Parameter renamed from
n_original_periodston_original_clusters- All call sites updated to pass the correctly named variable
- Docstring parameters updated to match
This maintains API consistency across the storage linking implementation.
Also applies to: 1395-1395, 1444-1445, 1461-1462, 1490-1491
1468-1471: Dimension renaming with proper coordinate reassignment.The dimension renames from
'cluster_boundary'to'original_cluster'are correctly implemented:
- Lines 1468-1471: Rename dimension, then assign coordinates using
np.arange(n_original_clusters)- Lines 1517-1518: Similar pattern for
soc_d- Lines 1530-1535: Rename
'cluster'to'original_cluster'after reordering viacluster_orderThe coordinate reassignment after dimension renaming ensures proper alignment and avoids index/coordinate mismatches.
Also applies to: 1517-1518, 1530-1535
docs/notebooks/08c-clustering.ipynb (5)
32-32: LGTM! Modern xarray-based plotting approach.The addition of xarray import and dataset-based visualization aligns well with the new fxplot API, providing cleaner and more maintainable plotting code.
Also applies to: 74-80
150-159: Excellent addition for reproducibility.Setting
random_state=42ensures consistent, reproducible results in the tutorial, which is a best practice for documentation and examples.
189-241: Clear demonstration of clustering metadata API.The progressive disclosure of clustering metadata (properties → metrics → visualization) provides an excellent learning path for users.
247-298: Good demonstration of algorithm comparison workflow.The examples effectively show how to compare different clustering algorithms using both cluster assignments and quality metrics. The RMSE comparison is particularly useful for understanding clustering quality trade-offs.
305-342: Valuable workflow for design comparisons.The manual cluster assignment example addresses a real analysis need: ensuring consistent cluster structure across design variants for fair comparison. Well explained and implemented.
flixopt/transform_accessor.py (3)
585-641: Well-designed API extension with comprehensive documentation.The new clustering parameters provide valuable control over the clustering process while maintaining good defaults. The docstring is thorough and includes helpful examples. The choice of
'hierarchical'as the default is appropriate since it's deterministic.
941-977: LGTM! Consistent terminology update.The renaming from
original_periodtooriginal_clusterimproves clarity by avoiding confusion with the model's "period" dimension (years/months). The change is consistently applied across all code paths.
1324-1390: Complex but correct inter-cluster storage handling.The SOC_boundary integration logic properly handles the extra timestep and applies self-discharge decay based on time within each period. The vectorized operations with DataArray indexing are efficient.
The cleanup at lines 1384-1390 is good housekeeping, removing internal variables that shouldn't be exposed in the expanded solution.
flixopt/dataset_plot_accessor.py (5)
16-38: LGTM! Clean extension for multi-variable support.Making 'variable' available as an x-axis option when there are multiple data variables is intuitive and maintains backward compatibility with the default
n_data_vars=1.
41-96: Well-structured extension of auto-resolution logic.The addition of color to the facet resolution system is clean and follows the existing pattern. The function signature change to return a 4-tuple is properly handled by all call sites in the file.
147-223: Robust color handling with proper defaults.The color resolution logic correctly handles edge cases (single variable, no color dimension) and uses appropriate fallbacks. The defensive check for
'color' not in px_kwargsprevents accidental overrides.
225-469: Consistent color support across plotting methods.All four methods (bar, stacked_bar, line, area) implement color handling consistently, maintaining the same API patterns and defensive coding practices.
471-684: Correct integration with color-unaware plot types.Heatmap, scatter, and pie methods correctly pass
Nonefor the color parameter since these plot types don't use the color dimension for categorical grouping. The API remains clean and consistent.flixopt/clustering/base.py (5)
34-130: Consistent and clear terminology update.The renaming to
original_clusterthroughout ClusterStructure eliminates ambiguity with the model's "period" dimension and improves code clarity. The documentation and property names align well.
969-983: Valuable addition for clustering quality evaluation.The
metricsattribute provides users with quantitative measures of clustering quality (RMSE, MAE) which is essential for evaluating and comparing different clustering configurations. Making it optional (None) maintains flexibility for backends that don't compute metrics.
200-254: Modernized plotting with fxplot API.The update to use
fxplot.heatmap()provides better defaults and consistency with the rest of the plotting infrastructure. The visualization effectively shows cluster assignments.
578-699: Flexible and intuitive comparison visualization.The addition of
colorandline_dashparameters provides users with multiple ways to distinguish between original and clustered data. The default of using line dash for representation (Original=dotted, Clustered=solid) is an effective visual distinction.
856-953: Excellent UX: cluster labels include occurrence counts.Adding occurrence counts to cluster labels (e.g., "Cluster 0 (×45)") provides immediate insight into cluster weights without needing to reference separate data. This is a thoughtful UX improvement.
| # Handle predef_cluster_order for multi-dimensional case | ||
| predef_order_slice = None | ||
| if predef_cluster_order is not None: | ||
| if isinstance(predef_cluster_order, xr.DataArray): | ||
| # Extract slice for this (period, scenario) combination | ||
| predef_order_slice = predef_cluster_order.sel(**selector, drop=True).values | ||
| else: | ||
| # Simple array/list - use directly | ||
| predef_order_slice = predef_cluster_order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add validation for predef_cluster_order dimensions.
If predef_cluster_order is a DataArray with unexpected dimensions (e.g., missing 'period' when the FlowSystem has periods), the .sel() call could fail or produce incorrect results.
Consider adding dimension validation:
if predef_cluster_order is not None:
if isinstance(predef_cluster_order, xr.DataArray):
# Validate dimensions match FlowSystem structure
expected_dims = {'original_cluster'}
if has_periods:
expected_dims.add('period')
if has_scenarios:
expected_dims.add('scenario')
if set(predef_cluster_order.dims) != expected_dims:
raise ValueError(
f'predef_cluster_order dims {set(predef_cluster_order.dims)} '
f'do not match expected {expected_dims}'
)
predef_order_slice = predef_cluster_order.sel(**selector, drop=True).values
else:
predef_order_slice = predef_cluster_order🤖 Prompt for AI Agents
In flixopt/transform_accessor.py around lines 733 to 741, validate DataArray
dimensions of predef_cluster_order before using .sel(): compute expected_dims =
{'original_cluster'} and add 'period' if has_periods and 'scenario' if
has_scenarios, then if set(predef_cluster_order.dims) != expected_dims raise a
ValueError with a clear message showing actual vs expected dims; after the check
proceed to extract the slice with predef_cluster_order.sel(**selector,
drop=True).values, otherwise accept non-DataArray inputs unchanged.
| n_original_clusters = cluster_structure.n_original_clusters | ||
|
|
||
| # Expand function using ClusterResult.expand_data() - handles multi-dimensional cases | ||
| def expand_da(da: xr.DataArray) -> xr.DataArray: | ||
| # For charge_state with cluster dim, also includes the extra timestep | ||
| last_original_cluster_idx = (n_original_timesteps - 1) // timesteps_per_cluster | ||
|
|
||
| def expand_da(da: xr.DataArray, var_name: str = '') -> xr.DataArray: | ||
| if 'time' not in da.dims: | ||
| return da.copy() | ||
| return info.result.expand_data(da, original_time=original_timesteps) | ||
| expanded = info.result.expand_data(da, original_time=original_timesteps) | ||
|
|
||
| # For charge_state with cluster dim, append the extra timestep value | ||
| if var_name.endswith('|charge_state') and 'cluster' in da.dims: | ||
| # Get extra timestep from last cluster using vectorized selection | ||
| cluster_order = cluster_structure.cluster_order # (n_original_clusters,) or with period/scenario | ||
| if cluster_order.ndim == 1: | ||
| last_cluster = int(cluster_order[last_original_cluster_idx]) | ||
| extra_val = da.isel(cluster=last_cluster, time=-1) | ||
| else: | ||
| # Multi-dimensional: select last cluster for each period/scenario slice | ||
| last_clusters = cluster_order.isel(original_cluster=last_original_cluster_idx) | ||
| extra_val = da.isel(cluster=last_clusters, time=-1) | ||
| # Drop 'cluster'/'time' coords created by isel (kept as non-dim coords) | ||
| extra_val = extra_val.drop_vars(['cluster', 'time'], errors='ignore') | ||
| extra_val = extra_val.expand_dims(time=[original_timesteps_extra[-1]]) | ||
| expanded = xr.concat([expanded, extra_val], dim='time') | ||
|
|
||
| return expanded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's understand the context around line 1257
cd flixopt && head -1280 transform_accessor.py | tail -100 | cat -nRepository: flixOpt/flixopt
Length of output: 5859
🏁 Script executed:
# Search for timesteps_per_cluster usage and validation
rg -n "timesteps_per_cluster" --type py -B 3 -A 3Repository: flixOpt/flixopt
Length of output: 38250
🏁 Script executed:
# Look for clustering initialization and validation
rg -n "n_original_timesteps" --type py -B 2 -A 2 | head -50Repository: flixOpt/flixopt
Length of output: 4320
🏁 Script executed:
# Check how cluster_structure is created and validated
rg -n "class ClusterResult\|n_original_clusters" --type py -B 2 -A 5Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Get the exact usage of last_original_cluster_idx at lines 1268-1275
sed -n '1257,1280p' flixopt/transform_accessor.py | cat -nRepository: flixOpt/flixopt
Length of output: 1745
🏁 Script executed:
# Check cluster_order indexing bounds - what is its size?
rg -n "cluster_order\[" --type py -B 2 -A 2 | head -40Repository: flixOpt/flixopt
Length of output: 2288
🏁 Script executed:
# Look for validation that ensures n_original_timesteps is properly divisible
rg -n "n_original_timesteps.*%\|%.*timesteps_per_cluster\|divisib" --type pyRepository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check if there are edge case tests with partial clusters
rg -n "partial\|incomplete\|remainder" tests/ --type py | head -20Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check if n_original_clusters is calculated or derived from n_original_timesteps
rg -n "n_original_clusters.*=" flixopt/clustering/base.py | head -10Repository: flixOpt/flixopt
Length of output: 206
🏁 Script executed:
# Look at the cluster() method to see if it validates perfect divisibility
sed -n '650,750p' flixopt/transform_accessor.py | grep -n "cluster\|timestep"Repository: flixOpt/flixopt
Length of output: 2212
🏁 Script executed:
# Check what original_timesteps contains and how n_original_timesteps relates to it
rg -n "original_timesteps.*=" flixopt/transform_accessor.py | head -15Repository: flixOpt/flixopt
Length of output: 777
🏁 Script executed:
# Search for any bounds checking or clamping around cluster_order access
rg -n "cluster_order\[.*_idx\]\|isel.*original_cluster.*idx" --type py -B 1 -A 1Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check how n_original_clusters is used and whether it can mismatch with actual cluster_order size
rg -n "n_original_clusters" flixopt/clustering/base.py | head -20Repository: flixOpt/flixopt
Length of output: 1003
🏁 Script executed:
# See the full context of how n_original_clusters is determined during clustering
sed -n '1115,1165p' flixopt/clustering/base.pyRepository: flixOpt/flixopt
Length of output: 1877
🏁 Script executed:
# Check what create_cluster_structure_from_mapping does with incomplete timesteps
sed -n '1115,1170p' flixopt/clustering/base.py | cat -nRepository: flixOpt/flixopt
Length of output: 2453
🏁 Script executed:
# Verify: does cluster_order always have exactly n_original_clusters entries?
python3 << 'EOF'
# Simulate the scenario from the review
n_original_timesteps = 169 # Not divisible by 24
timesteps_per_cluster = 24
last_original_cluster_idx = (n_original_timesteps - 1) // timesteps_per_cluster
n_original_clusters = n_original_timesteps // timesteps_per_cluster
print(f"n_original_timesteps: {n_original_timesteps}")
print(f"timesteps_per_cluster: {timesteps_per_cluster}")
print(f"n_original_clusters: {n_original_clusters}")
print(f"last_original_cluster_idx: {last_original_cluster_idx}")
print(f"Valid cluster_order indices: 0-{n_original_clusters - 1}")
print(f"Accessing cluster_order[{last_original_cluster_idx}] would be: ", end="")
if last_original_cluster_idx >= n_original_clusters:
print(f"OUT OF BOUNDS! (index {last_original_cluster_idx} >= size {n_original_clusters})")
else:
print("OK")
EOFRepository: flixOpt/flixopt
Length of output: 267
🏁 Script executed:
# Check if the FlowSystem or clustering validates divisibility
rg -n "n_original_timesteps.*%\|n_original_timesteps.*//\|assert.*divisible" flixopt/ --type pyRepository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check the cluster() method for any validation or handling of partial clusters
sed -n '600,950p' flixopt/transform_accessor.py | grep -n "n_original\|divisib\|partial\|remainder" | head -20Repository: flixOpt/flixopt
Length of output: 254
🏁 Script executed:
# Look specifically at how cluster_order is sized in cluster() vs expand_solution()
rg -n "cluster_order.*=" flixopt/transform_accessor.py -A 2 | head -40Repository: flixOpt/flixopt
Length of output: 2035
🏁 Script executed:
# Check if there's any documentation or code comment about cluster boundary conditions
rg -n "last.*cluster\|partial.*period\|remainder" flixopt/ --type py -B 1 -A 1Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check if n_original_clusters is calculated with any remainder handling
rg -n "n_original_clusters.*=" flixopt/transform_accessor.py -B 2 -A 2Repository: flixOpt/flixopt
Length of output: 366
🏁 Script executed:
# Look for any comment or code about handling partial/incomplete clusters
rg -n "incomplete\|partial\|remainder\|not.*divisible" flixopt/ --type pyRepository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check the full cluster() method to see if it enforces or validates divisibility
sed -n '575,950p' flixopt/transform_accessor.py | tail -200 | head -150Repository: flixOpt/flixopt
Length of output: 7914
🏁 Script executed:
# Verify the actual size of cluster_order that's created in cluster()
python3 << 'EOF'
# From cluster() at lines 950-967, cluster_order is created from:
# cluster_orders[key] which comes from tsam_agg.clusterOrder
# The size depends on how many original_cluster periods there are
# In expand_solution() at line 1253:
# n_original_clusters = cluster_structure.n_original_clusters
# This is calculated from cluster_order's size
# If cluster_order has n_original_clusters entries (indices 0 to n_original_clusters-1)
# And we try to access cluster_order[last_original_cluster_idx]
# where last_original_cluster_idx = (n_original_timesteps - 1) // timesteps_per_cluster
# Then if n_original_timesteps is not perfectly divisible:
# last_original_cluster_idx can equal n_original_clusters (out of bounds)
print("The issue occurs when:")
print(" n_original_timesteps % timesteps_per_cluster != 0")
print()
print("Example:")
print(" n_original_timesteps = 169")
print(" timesteps_per_cluster = 24")
print(" n_original_clusters = 169 // 24 = 7 (cluster_order size)")
print(" last_original_cluster_idx = (169-1) // 24 = 7 (INDEX OUT OF BOUNDS)")
EOFRepository: flixOpt/flixopt
Length of output: 330
🏁 Script executed:
# Check if there are any tests with non-divisible timesteps
rg -n "test.*expand\|test.*cluster" tests/ --type py | grep -i testRepository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Look for test fixtures that might use non-standard timesteps
rg -n "8_days\|7_days\|169\|partial" tests/ --type py | head -20Repository: flixOpt/flixopt
Length of output: 41
🏁 Script executed:
# Check the simple_system_8_days fixture to see timesteps
rg -n "simple_system_8_days" tests/ -A 10 | head -30Repository: flixOpt/flixopt
Length of output: 2383
🏁 Script executed:
# Verify if there's protection/clamping at line 1273 (multi-dim case)
sed -n '1267,1280p' flixopt/transform_accessor.py | cat -nRepository: flixOpt/flixopt
Length of output: 1113
🏁 Script executed:
# Check if cluster_order can have fewer entries than n_original_clusters suggests
python3 << 'EOF'
# Looking at the code flow:
# 1. cluster_orders[key] = tsam_agg.clusterOrder
# This is the cluster order from TSAM - one value per original period
#
# 2. cluster_order_da created at line 952-953:
# xr.DataArray(cluster_orders[key], dims=['original_cluster'], name='cluster_order')
#
# 3. In expand_solution() at line 1253:
# n_original_clusters = cluster_structure.n_original_clusters
#
# 4. cluster_structure.n_original_clusters is a property that returns:
# len(cluster_order) (the size of the original_cluster dimension)
#
# So if cluster_order has 7 entries, n_original_clusters = 7
# Valid indices: 0-6
# last_original_cluster_idx = (169-1) // 24 = 7 → OUT OF BOUNDS
print("CONFIRMED BUG:")
print("cluster_order.shape[0] = n_original_clusters")
print("cluster_order[last_original_cluster_idx] will fail if")
print(" last_original_cluster_idx >= n_original_clusters")
print()
print("This happens when:")
print(" n_original_timesteps % timesteps_per_cluster != 0")
EOFRepository: flixOpt/flixopt
Length of output: 295
Fix out-of-bounds access in last_original_cluster_idx calculation.
Line 1257 calculates the last original cluster index without bounds checking:
last_original_cluster_idx = (n_original_timesteps - 1) // timesteps_per_clusterWhen n_original_timesteps % timesteps_per_cluster != 0 (partial clusters at the end), this produces an index equal to or exceeding n_original_clusters, causing IndexError at lines 1269 and 1273 when accessing cluster_order[last_original_cluster_idx].
Clamp the index to valid bounds:
last_original_cluster_idx = min(
(n_original_timesteps - 1) // timesteps_per_cluster,
n_original_clusters - 1
)🤖 Prompt for AI Agents
In flixopt/transform_accessor.py around lines 1253 to 1280, the computation of
last_original_cluster_idx can exceed the valid range when the final cluster is
partial, leading to out-of-bounds access of cluster_order; clamp the computed
index to n_original_clusters - 1 (i.e., take the min of the current formula and
n_original_clusters - 1) so subsequent indexing at lines 1269 and 1273 cannot
raise IndexError.
1. Cell 32 (API Reference table): Updated defaults to 'hierarchical', 'medoidRepresentation', and None 2. Cell 16: Swapped the example to show k_means as the alternative (since hierarchical is now default) 3. Cell 17: Updated variable names to match 4. Cell 33 (Key Takeaways): Clarified that random_state is only needed for non-deterministic methods like 'k_means' The code review
…arning log and empty DataFrame fallback, plus handling empty DataFrames when building the metrics Dataset 2. Random state to tsam - Replaced global np.random.seed() with passing seed parameter directly to tsam's TimeSeriesAggregation 3. tsam_kwargs conflict validation - Added validation that raises ValueError if user tries to override explicit parameters via **tsam_kwargs (including seed) 4. predef_cluster_order validation - Added dimension validation for DataArray inputs, checking they match the FlowSystem's period/scenario structure 5. Out-of-bounds fix - Clamped last_original_cluster_idx to n_original_clusters - 1 to handle partial clusters at the end
…ey) or next(...) to explicit if metrics_df is None: check 2. removed random state
# Conflicts: # docs/notebooks/08c-clustering.ipynb # flixopt/dataset_plot_accessor.py
# Conflicts: # docs/notebooks/08c-clustering.ipynb # flixopt/dataset_plot_accessor.py
# Conflicts: # .github/workflows/docs.yaml # mkdocs.yml
…mmary of the changes made:
Changes Made
1. flixopt/config.py
- Replaced three separate config attributes (extra_dim_priority, dim_slot_priority, x_dim_priority) with a single unified dim_priority tuple
- Updated CONFIG.Plotting class docstring and attribute definitions
- Updated to_dict() method to use the new attribute
- The new priority order: ('time', 'duration', 'duration_pct', 'variable', 'cluster', 'period', 'scenario')
2. flixopt/dataset_plot_accessor.py
- Created new assign_slots() function that centralizes all dimension-to-slot assignment logic
- Fixed slot fill order: x → color → facet_col → facet_row → animation_frame
- Updated all plot methods (bar, stacked_bar, line, area, heatmap, scatter, pie) to use assign_slots()
- Removed old _get_x_dim() and _resolve_auto_facets() functions
- Updated docstrings to reference dim_priority instead of x_dim_priority
3. flixopt/statistics_accessor.py
- Updated _resolve_auto_facets() to use the new assign_slots() function internally
- Added import for assign_slots from dataset_plot_accessor
Key Design Decisions
- Single priority list controls all auto-assignment
- Slots are filled in fixed order based on availability
- None means a slot is not available for that plot type
- 'auto' triggers auto-assignment from priority list
- Explicit string values override auto-assignment
Description
Brief description of the changes in this PR.
Type of Change
Related Issues
Closes #(issue number)
Testing
Checklist
Summary by CodeRabbit
Release Notes
New Features
Breaking Changes
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.