Clustering Parameter Refinements & Unified Slot Assignment #552

FBumann · 2026-01-04T12:16:44Z

Description

Brief description of the changes in this PR.

Type of Change

Bug fix
New feature
Documentation update
Code refactoring

Related Issues

Closes #(issue number)

Testing

I have tested my changes
Existing tests still pass

Checklist

My code follows the project style
I have updated documentation if needed
I have added tests for new functionality (if applicable)

Summary by CodeRabbit

Release Notes

New Features

Expanded clustering capabilities with advanced configuration options (multiple clustering methods, reproducibility controls, and quality metrics).
New FXPlot accessor enables plotting custom xarray datasets beyond simulation results.
Inter-cluster storage linking support.

Breaking Changes

Clustering API terminology and weight normalization updated.
Deprecated v5.0 classes removed; consult migration guide.
Plotting method defaults adjusted for improved dimension handling.

Documentation

Comprehensive migration guides and advanced clustering examples added.
Updated documentation reflecting new API parameters and capabilities.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

New Parameters Added to cluster() Method | Parameter | Type | Default | Purpose | |-------------------------|-------------------------------|----------------------|--------------------------------------------------------------------------------------------------------------------| | cluster_method | Literal[...] | 'k_means' | Clustering algorithm ('k_means', 'hierarchical', 'k_medoids', 'k_maxoids', 'averaging') | | representation_method | Literal[...] | 'meanRepresentation' | How clusters are represented ('meanRepresentation', 'medoidRepresentation', 'distributionAndMinMaxRepresentation') | | extreme_period_method | Literal[...] | 'new_cluster_center' | How peaks are integrated ('None', 'append', 'new_cluster_center', 'replace_cluster_center') | | rescale_cluster_periods | bool | True | Rescale clusters to match original means | | random_state | int | None | None | Random seed for reproducibility | | predef_cluster_order | np.ndarray | list[int] | None | None | Manual clustering assignments | | **tsam_kwargs | Any | - | Pass-through for any tsam parameter | Clustering Quality Metrics Access via fs.clustering.metrics after clustering - returns a DataFrame with RMSE, MAE, and other accuracy indicators per time series. Files Modified 1. flixopt/transform_accessor.py - Updated cluster() signature and tsam call 2. flixopt/clustering/base.py - Added metrics field to Clustering class 3. tests/test_clustering/test_integration.py - Added tests for new parameters 4. docs/user-guide/optimization/clustering.md - Updated documentation

Property renamed: n_original_periods → n_original_clusters

…imestep that regular FlowSystems have. Root Cause: In expand_solution(), the solution was only indexed by original_timesteps (n elements) instead of original_timesteps_extra (n+1 elements). Fix in flixopt/transform_accessor.py: 1. Reindex solution to timesteps_extra (line 1296-1298): - Added expanded_fs._solution.reindex(time=original_timesteps_extra) for consistency with non-expanded FlowSystems 2. Fill extra timestep for charge_state (lines 1300-1333): - Added special handling to properly fill the extra timestep for storage charge_state variables using the last cluster's extra timestep value 3. Updated intercluster storage handling (lines 1340-1388): - Modified to work with original_timesteps_extra instead of just original_timesteps - The extra timestep now correctly gets the final SOC boundary value with proper decay applied Tests updated in tests/test_cluster_reduce_expand.py: - Updated 4 assertions that check solution time coordinates to expect 193 (192 + 1 extra) instead of 192

coderabbitai · 2026-01-04T12:16:49Z

📝 Walkthrough

Walkthrough

This PR releases version 6.0.0 introducing major clustering enhancements, a new fxplot plotting accessor, refined plotting configuration with color as a first-class dimension, and terminology standardization (original_period → original_cluster). It removes deprecated v5.0 classes, extends the clustering API with new parameters (cluster_method, representation_method, random_state, rescale_cluster_periods, extreme_period_method, predef_cluster_order), adds clustering quality metrics, and updates documentation, notebooks, and tests accordingly.

Changes

Cohort / File(s)	Summary
Release & Configuration `CHANGELOG.md`, `flixopt/config.py`	Added v6.0.0 release notes documenting new clustering features, fxplot accessor, breaking changes, and deprecations. Updated deprecation removal version from 6.0.0 to 7.0.0 and expanded plotting config priorities to include 'variable' and 'color' as first-class dimensions.
Clustering Core & Terminology `flixopt/clustering/base.py`, `flixopt/clustering/intercluster_helpers.py`, `flixopt/components.py`	Renamed "original_period" terminology to "original_cluster" across ClusterStructure, Clustering, and component classes; added public `metrics` attribute (xr.Dataset \| None); updated n_original_periods property to n_original_clusters; adjusted inter-cluster linking logic and boundary coordinate calculations.
Clustering API & Execution `flixopt/transform_accessor.py`	Extended cluster() method signature with new parameters: cluster_method, representation_method, extreme_period_method, rescale_cluster_periods, random_state, predef_cluster_order, and **tsam_kwargs. Added multi-dimensional metrics aggregation, improved expand_solution() with extra timestep and original_cluster handling, and per-period/scenario cluster configuration support.
Plotting Infrastructure & Color Handling `flixopt/dataset_plot_accessor.py`	Introduced color as a first-class plotting dimension across bar, stacked_bar, line, area, and related methods via new color parameter (str \| 'auto' \| None). Updated _get_x_dim() and _resolve_auto_facets() signatures to handle color resolution independently; adjusted facet and color mapping logic.
Documentation & Notebooks `docs/user-guide/optimization/clustering.md`, `docs/user-guide/results-plotting.md`, `docs/notebooks/08c-clustering.ipynb`	Added clustering "Advanced Clustering Options" and "Clustering Quality Metrics" sections with code examples; documented new cluster() parameters and their defaults. Added plotting tip for custom xarray data via fxplot accessor. Updated notebook with new clustering parameters, reproducible random state, and fxplot-based visualizations.
Documentation Navigation `mkdocs.yml`	Restructured Scaling section: renamed "Clustering" (resampling) item to "Resampling"; added new "Clustering" submenu with four entries (Basics, Storage Modes, Multi-Period, Internals).
Tests `tests/test_cluster_reduce_expand.py`, `tests/test_clustering/test_base.py`, `tests/test_clustering/test_integration.py`	Updated assertions to expect extra timestep (193 vs 192) in expanded solutions; renamed dimension from 'original_period' to 'original_cluster'; replaced n_original_periods references with n_original_clusters; added comprehensive TestClusterAdvancedOptions suite validating new clustering parameters, metrics, and reproducibility.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Feature: Clustering IO, single selection, and weights API improvements #549: Overlapping clustering subsystem changes—modifies Clustering/ClusterStructure serialization, cluster_weight/weights handling in FlowSystem, and expand_solution logic.
ci: Speedup Notebook execution #551: Parallel plotting infrastructure updates—adds/alters fxplot plotting config, DatasetPlotAccessor methods, _get_x_dim priorities, and color-handling behavior.
Feature: fxplot Plotting Accessor #548: Related xarray plotting accessor registration—introduces fxplot accessor, DatasetPlotAccessor/DataArray wrappers, and CONFIG plotting defaults modifications.

Poem

🐰 Clusters bloom where periods once were,
New methods whisper which algorithm to prefer,
Colors dance as first-class citizens now,
Metrics shine with their how and their how,
Six-point-oh hops into the future! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is a blank template with no actual content filled in; all sections are empty or unchecked, failing to describe the changes, type, related issues, testing status, or rationale.	Fill in all required sections with substantive content describing the changes, type of change, related issues, testing verification, and update the checklist to accurately reflect work completed.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 94.64% which is sufficient. The required threshold is 80.00%.
Title check	✅ Passed	The title addresses clustering parameter additions and plotting slot assignment, which are major themes in the PR, but is somewhat generic and doesn't capture the full scope of changes including terminology refactoring, storage expansion, and documentation updates.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…sts in the melted DataFrame from data_var names, not as a dimension) - When facet_row='variable' or facet_col='variable' is passed, it's passed through directly - In line(), when faceting by variable, it's not also used for color (avoids double encoding)

…at it consistently: flixopt/config.py: 'extra_dim_priority': ('variable', 'cluster', 'period', 'scenario'), 'x_dim_priority': ('time', 'duration', 'duration_pct', 'variable', 'period', 'scenario', 'cluster'), flixopt/dataset_plot_accessor.py: - _get_x_dim: Now takes n_data_vars parameter; 'variable' is available when > 1 - _resolve_auto_facets: 'variable' is available when len(data_vars) > 1 and respects exclude_dims Behavior: - 'variable' is treated like any other dimension in the priority system - Only available when there are multiple data_vars - Properly excluded when already used (e.g., for x-axis)

FBumann · 2026-01-04T14:45:52Z

@coderabbitai review

coderabbitai · 2026-01-04T14:45:58Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

docs/notebooks/08c-clustering.ipynb (1)
549-590: Critical: Documentation defaults don't match implementation.

The API reference table shows incorrect default values for several parameters:

Line 556: Documents cluster_method default as 'k_means', but the implementation (transform_accessor.py line 585) defaults to 'hierarchical'

Line 557: Documents representation_method default as 'meanRepresentation', but the implementation (line 588) defaults to 'medoidRepresentation'

Line 558: Documents extreme_period_method default as 'new_cluster_center', but the implementation (line 589) defaults to None
🔎 Proposed corrections for the API reference table
-| `cluster_method` | `str` | 'k_means' | Algorithm: 'k_means', 'hierarchical', 'k_medoids', 'k_maxoids', 'averaging' |
+| `cluster_method` | `str` | 'hierarchical' | Algorithm: 'k_means', 'hierarchical', 'k_medoids', 'k_maxoids', 'averaging' |
-| `representation_method` | `str` | 'meanRepresentation' | 'meanRepresentation', 'medoidRepresentation', 'distributionAndMinMaxRepresentation' |
+| `representation_method` | `str` | 'medoidRepresentation' | 'meanRepresentation', 'medoidRepresentation', 'distributionAndMinMaxRepresentation' |
-| `extreme_period_method` | `str` | 'new_cluster_center' | How peaks are integrated: 'None', 'append', 'new_cluster_center', 'replace_cluster_center' |
+| `extreme_period_method` | `str` | None | How peaks are integrated: None, 'append', 'new_cluster_center', 'replace_cluster_center' |
flixopt/transform_accessor.py (1)
720-807: Add error handling for clustering metrics computation.

The code assumes tsam_agg.accuracyIndicators() always succeeds and returns a valid DataFrame. If tsam encounters issues computing metrics, this could raise an exception.

Consider adding defensive error handling:
try:
    clustering_metrics_all[key] = tsam_agg.accuracyIndicators()
except Exception as e:
    logger.warning(f'Failed to compute clustering metrics for {key}: {e}')
    clustering_metrics_all[key] = pd.DataFrame()  # Empty DataFrame as fallback
Then handle the case where all metrics are empty when building the Dataset.

🧹 Nitpick comments (2)

flixopt/transform_accessor.py (2)
715-717: Consider passing random_state to tsam instead of setting global state.

Setting np.random.seed() globally can affect other parts of the code that use numpy's random functions, potentially causing hard-to-debug issues.

If tsam's TimeSeriesAggregation accepts a random_state or seed parameter, consider passing it directly rather than modifying global state:
# Instead of:
if random_state is not None:
    np.random.seed(random_state)

# Consider:
if random_state is not None:
    tsam_kwargs.setdefault('randomSeed', random_state)
Verify tsam's API to see if this is supported. If not, the current approach is acceptable but should be documented.

746-760: Document tsam_kwargs precedence to prevent conflicts.

Users could accidentally override explicit parameters via **tsam_kwargs, leading to confusing behavior. For example:
cluster(..., cluster_method='hierarchical', **{'clusterMethod': 'k_means'})
Consider adding a docstring note or validation:
# Ensure tsam_kwargs doesn't override explicit parameters
reserved_keys = {'clusterMethod', 'extremePeriodMethod', 'representationMethod', 
                 'rescaleClusterPeriods', 'predefClusterOrder'}
conflicts = reserved_keys & set(tsam_kwargs.keys())
if conflicts:
    raise ValueError(
        f'Cannot override explicit parameters via tsam_kwargs: {conflicts}'
    )
Alternatively, just document the precedence in the docstring.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf6962c and 6b0579f.

📒 Files selected for processing (14)

CHANGELOG.md
docs/notebooks/08c-clustering.ipynb
docs/user-guide/optimization/clustering.md
docs/user-guide/results-plotting.md
flixopt/clustering/base.py
flixopt/clustering/intercluster_helpers.py
flixopt/components.py
flixopt/config.py
flixopt/dataset_plot_accessor.py
flixopt/transform_accessor.py
mkdocs.yml
tests/test_cluster_reduce_expand.py
tests/test_clustering/test_base.py
tests/test_clustering/test_integration.py

🧰 Additional context used

🧬 Code graph analysis (2)

flixopt/dataset_plot_accessor.py (2)

flixopt/config.py (2)

CONFIG (185-912)

Plotting (551-591)

flixopt/color_processing.py (1)

process_colors (165-233)

flixopt/components.py (2)

flixopt/clustering/base.py (3)

n_original_clusters (127-129)

n_original_clusters (1051-1055)

cluster_order (1029-1033)

flixopt/clustering/intercluster_helpers.py (1)

build_boundary_coords (134-178)

🪛 markdownlint-cli2 (0.18.1)

CHANGELOG.md

129-129: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: test (3.13)
GitHub Check: test (3.11)
GitHub Check: test (3.14)
GitHub Check: Build documentation

🔇 Additional comments (39)

mkdocs.yml (1)

72-78: LGTM! Well-organized documentation restructuring.

The separation of "Resampling" and "Clustering" as distinct concepts is clear, and the new hierarchical structure for clustering documentation (Basics, Storage Modes, Multi-Period, Internals) provides logical progressive coverage.

docs/user-guide/results-plotting.md (1)

5-7: LGTM! Helpful cross-reference added.

The tip block effectively guides users to the generalized plotting pathway for arbitrary xarray data, complementing the flixopt-specific plotting documentation.

tests/test_cluster_reduce_expand.py (2)

452-454: LGTM! Terminology updated consistently.

The change from n_original_periods to n_original_clusters aligns with the broader refactoring across the codebase to use cluster-based terminology.

170-170: The extra timestep is correctly implemented and intentionally required for inter-cluster storage linking.

The 193-timestep structure (192 original + 1 extra) is fundamental to how storage constraints with SOC_boundary work. The calculate_timestep_duration() method correctly indexes to timesteps_extra[:-1], yielding 192 duration values that correspond to the 192 intervals between 193 time points. When flow_hours multiplies flow rates by these durations, xarray's broadcasting correctly produces NaN for the final timestep in non-storage variables while storage charge_state properly captures the boundary condition.

The expansion logic in expand_solution() (lines 1324–1382) correctly:

Expands charge_state with the extra timestep for each cluster

Combines it with SOC_boundary accounting for self-discharge decay using the formula: SOC(t) = SOC_boundary[d] * (1-loss)^t + ΔE(t)

Reindexes all solution variables to timesteps_extra for consistency

The test at lines 176–210 confirms that total_effects and flow_hours match between clustered and expanded solutions, validating that all downstream statistics and operations handle the extra timestep structure correctly.

flixopt/clustering/intercluster_helpers.py (1)

134-167: LGTM! API updated to use consistent cluster terminology.

The function signature and implementation have been updated from n_original_periods to n_original_clusters, aligning with the broader terminology standardization across the clustering module. The logic and calculation remain correct. All callers—including the call at flixopt/components.py:1341—have been updated to use the new parameter name, and no references to the old parameter remain in the codebase.

tests/test_clustering/test_integration.py (1)

173-269: LGTM! Comprehensive test coverage for advanced clustering options.

The new TestClusterAdvancedOptions test class provides thorough coverage of the newly added clustering parameters including cluster_method, random_state, representation_method, rescale_cluster_periods, and tsam kwargs passthrough. The fixture setup is clean and the tests are well-structured.

tests/test_clustering/test_base.py (1)

20-20: LGTM! Consistent terminology update.

The renaming from original_period to original_cluster is applied systematically across all test cases. The dimension names and property assertions are updated consistently, reflecting the broader terminology standardization in the PR.

Also applies to: 32-32, 45-45, 50-50, 139-139

CHANGELOG.md (6)

54-59: Excellent release documentation structure.

The v6.0.0 header clearly marks this as a major release with a concise summary highlighting the three main features: time-series clustering, fxplot accessor, and removal of deprecated classes. The warning callout for breaking changes with migration guidance is particularly helpful.

75-111: Well-documented advanced clustering configuration.

The new clustering parameters are clearly explained with:

Clean parameter descriptions in the initial table

Practical code example showing combined usage

Comprehensive algorithm comparison table

Guidance on using **kwargs for advanced tsam parameters

This provides excellent user guidance for the extended clustering API.

112-126: Clear documentation of clustering quality metrics.

The metrics section effectively demonstrates:

How to access the metrics xr.Dataset

Expected structure with dimensions

Specific metric extraction pattern

This matches the clustering API extensions referenced in the PR objectives.

127-163: Comprehensive fxplot accessor documentation.

The FXPlot section provides:

Multiple practical examples showing the API

Complete method reference table

Key features breakdown

Clear integration points with any xarray Dataset

The code examples are clean and demonstrate the universal nature of the accessor.

Note: The static analysis warning about "fenced" vs "indented" code blocks at line 129 is a false positive—fenced code blocks with syntax highlighting are the recommended Markdown style.

176-209: Thorough deprecation notices for v7.0.0.

The deprecation section is well-organized:

Clear removal version (v7.0.0) at the top

Categorized by type (Classes, FlowSystem methods, Parameters, Topology methods)

Includes migration mappings in table format

Notes about simplified topology method names

This gives users clear advance notice for the next major version.

210-245: Helpful migration guide with side-by-side comparison.

The ClusteredOptimization migration section:

Uses tabs to show old vs. new code clearly

Includes parameter mapping (nr_of_periods → n_clusters)

Demonstrates the new two-stage workflow

Shows how to access results in both approaches

This makes upgrading straightforward for existing users.

docs/user-guide/optimization/clustering.md (3)

55-58: Well-documented clustering parameter additions.

The table extension clearly documents the four new parameters:

cluster_method: Algorithm selection with examples

representation_method: Representation strategy

random_state: Reproducibility control

rescale_cluster_periods: Period rescaling option

These align with the advanced clustering features mentioned in the PR objectives.

75-111: Comprehensive advanced clustering options guide.

The new section provides:

Practical example combining multiple advanced parameters

Complete algorithm comparison table (k_means, hierarchical, k_medoids, k_maxoids, averaging)

Clear guidance on **kwargs for tsam parameters (sameMean, sortValues)

This gives users fine-grained control over clustering behavior with good examples.

112-126: Clear clustering quality metrics documentation.

The metrics section demonstrates:

How to access fs_clustered.clustering.metrics

Expected xr.Dataset structure

Specific metric extraction (RMSE example)

This exposes the clustering quality evaluation capabilities added in this PR.

flixopt/config.py (2)

33-33: Correct deprecation version progression.

Updating DEPRECATION_REMOVAL_VERSION from '6.0.0' to '7.0.0' is appropriate—this release is v6.0.0, so deprecations should target the next major version (v7.0.0). This aligns with the extensive deprecation notices in the CHANGELOG.

167-169: Plotting priority updates for fxplot accessor.

The configuration changes introduce new dimension priorities:

'variable' added as first priority in extra_dim_priority

'color' added as first slot in dim_slot_priority

'variable' inserted in x_dim_priority

These changes support the new fxplot accessor feature documented in the CHANGELOG, where color is now a "first-class dimension" and variable is treated as a valid facet/color option. Since this is a major version release (6.0.0), these breaking changes in default behavior are acceptable and well-documented.

flixopt/components.py (3)

1198-1198: Consistent terminology update in InterclusterStorageModel.

The renaming from n_original_periods to n_original_clusters is systematically applied throughout the inter-cluster storage linking logic:

Variable names updated (line 1333)

Function call parameters updated (line 1341)

Index selections updated (lines 1370, 1380-1381)

Comments updated (line 1198)

The changes align with the broader PR terminology standardization and match the updates in related files (clustering/base.py, clustering/intercluster_helpers.py).

Also applies to: 1333-1333, 1341-1341, 1370-1370, 1380-1381

1363-1365: Method signatures correctly updated.

The function signatures for _add_linking_constraints and _add_combined_bound_constraints have been updated:

Parameter renamed from n_original_periods to n_original_clusters

All call sites updated to pass the correctly named variable

Docstring parameters updated to match

This maintains API consistency across the storage linking implementation.

Also applies to: 1395-1395, 1444-1445, 1461-1462, 1490-1491

1468-1471: Dimension renaming with proper coordinate reassignment.

The dimension renames from 'cluster_boundary' to 'original_cluster' are correctly implemented:

Lines 1468-1471: Rename dimension, then assign coordinates using np.arange(n_original_clusters)

Lines 1517-1518: Similar pattern for soc_d

Lines 1530-1535: Rename 'cluster' to 'original_cluster' after reordering via cluster_order

The coordinate reassignment after dimension renaming ensures proper alignment and avoids index/coordinate mismatches.

Also applies to: 1517-1518, 1530-1535

docs/notebooks/08c-clustering.ipynb (5)

32-32: LGTM! Modern xarray-based plotting approach.

The addition of xarray import and dataset-based visualization aligns well with the new fxplot API, providing cleaner and more maintainable plotting code.

Also applies to: 74-80

150-159: Excellent addition for reproducibility.

Setting random_state=42 ensures consistent, reproducible results in the tutorial, which is a best practice for documentation and examples.

189-241: Clear demonstration of clustering metadata API.

The progressive disclosure of clustering metadata (properties → metrics → visualization) provides an excellent learning path for users.

247-298: Good demonstration of algorithm comparison workflow.

The examples effectively show how to compare different clustering algorithms using both cluster assignments and quality metrics. The RMSE comparison is particularly useful for understanding clustering quality trade-offs.

305-342: Valuable workflow for design comparisons.

The manual cluster assignment example addresses a real analysis need: ensuring consistent cluster structure across design variants for fair comparison. Well explained and implemented.

flixopt/transform_accessor.py (3)

585-641: Well-designed API extension with comprehensive documentation.

The new clustering parameters provide valuable control over the clustering process while maintaining good defaults. The docstring is thorough and includes helpful examples. The choice of 'hierarchical' as the default is appropriate since it's deterministic.

941-977: LGTM! Consistent terminology update.

The renaming from original_period to original_cluster improves clarity by avoiding confusion with the model's "period" dimension (years/months). The change is consistently applied across all code paths.

1324-1390: Complex but correct inter-cluster storage handling.

The SOC_boundary integration logic properly handles the extra timestep and applies self-discharge decay based on time within each period. The vectorized operations with DataArray indexing are efficient.

The cleanup at lines 1384-1390 is good housekeeping, removing internal variables that shouldn't be exposed in the expanded solution.

flixopt/dataset_plot_accessor.py (5)

16-38: LGTM! Clean extension for multi-variable support.

Making 'variable' available as an x-axis option when there are multiple data variables is intuitive and maintains backward compatibility with the default n_data_vars=1.

41-96: Well-structured extension of auto-resolution logic.

The addition of color to the facet resolution system is clean and follows the existing pattern. The function signature change to return a 4-tuple is properly handled by all call sites in the file.

147-223: Robust color handling with proper defaults.

The color resolution logic correctly handles edge cases (single variable, no color dimension) and uses appropriate fallbacks. The defensive check for 'color' not in px_kwargs prevents accidental overrides.

225-469: Consistent color support across plotting methods.

All four methods (bar, stacked_bar, line, area) implement color handling consistently, maintaining the same API patterns and defensive coding practices.

471-684: Correct integration with color-unaware plot types.

Heatmap, scatter, and pie methods correctly pass None for the color parameter since these plot types don't use the color dimension for categorical grouping. The API remains clean and consistent.

flixopt/clustering/base.py (5)

34-130: Consistent and clear terminology update.

The renaming to original_cluster throughout ClusterStructure eliminates ambiguity with the model's "period" dimension and improves code clarity. The documentation and property names align well.

969-983: Valuable addition for clustering quality evaluation.

The metrics attribute provides users with quantitative measures of clustering quality (RMSE, MAE) which is essential for evaluating and comparing different clustering configurations. Making it optional (None) maintains flexibility for backends that don't compute metrics.

200-254: Modernized plotting with fxplot API.

The update to use fxplot.heatmap() provides better defaults and consistency with the rest of the plotting infrastructure. The visualization effectively shows cluster assignments.

578-699: Flexible and intuitive comparison visualization.

The addition of color and line_dash parameters provides users with multiple ways to distinguish between original and clustered data. The default of using line dash for representation (Original=dotted, Clustered=solid) is an effective visual distinction.

856-953: Excellent UX: cluster labels include occurrence counts.

Adding occurrence counts to cluster labels (e.g., "Cluster 0 (×45)") provides immediate insight into cluster weights without needing to reference separate data. This is a thoughtful UX improvement.

coderabbitai · 2026-01-04T14:53:40Z

flixopt/transform_accessor.py

+                # Handle predef_cluster_order for multi-dimensional case
+                predef_order_slice = None
+                if predef_cluster_order is not None:
+                    if isinstance(predef_cluster_order, xr.DataArray):
+                        # Extract slice for this (period, scenario) combination
+                        predef_order_slice = predef_cluster_order.sel(**selector, drop=True).values
+                    else:
+                        # Simple array/list - use directly
+                        predef_order_slice = predef_cluster_order


🛠️ Refactor suggestion | 🟠 Major

Add validation for predef_cluster_order dimensions.

If predef_cluster_order is a DataArray with unexpected dimensions (e.g., missing 'period' when the FlowSystem has periods), the .sel() call could fail or produce incorrect results.

Consider adding dimension validation:

if predef_cluster_order is not None: if isinstance(predef_cluster_order, xr.DataArray): # Validate dimensions match FlowSystem structure expected_dims = {'original_cluster'} if has_periods: expected_dims.add('period') if has_scenarios: expected_dims.add('scenario') if set(predef_cluster_order.dims) != expected_dims: raise ValueError( f'predef_cluster_order dims {set(predef_cluster_order.dims)} ' f'do not match expected {expected_dims}' ) predef_order_slice = predef_cluster_order.sel(**selector, drop=True).values else: predef_order_slice = predef_cluster_order

🤖 Prompt for AI Agents

In flixopt/transform_accessor.py around lines 733 to 741, validate DataArray dimensions of predef_cluster_order before using .sel(): compute expected_dims = {'original_cluster'} and add 'period' if has_periods and 'scenario' if has_scenarios, then if set(predef_cluster_order.dims) != expected_dims raise a ValueError with a clear message showing actual vs expected dims; after the check proceed to extract the slice with predef_cluster_order.sel(**selector, drop=True).values, otherwise accept non-DataArray inputs unchanged.

coderabbitai · 2026-01-04T14:53:40Z

flixopt/transform_accessor.py

+        n_original_clusters = cluster_structure.n_original_clusters

        # Expand function using ClusterResult.expand_data() - handles multi-dimensional cases
-        def expand_da(da: xr.DataArray) -> xr.DataArray:
+        # For charge_state with cluster dim, also includes the extra timestep
+        last_original_cluster_idx = (n_original_timesteps - 1) // timesteps_per_cluster
+
+        def expand_da(da: xr.DataArray, var_name: str = '') -> xr.DataArray:
            if 'time' not in da.dims:
                return da.copy()
-            return info.result.expand_data(da, original_time=original_timesteps)
+            expanded = info.result.expand_data(da, original_time=original_timesteps)
+
+            # For charge_state with cluster dim, append the extra timestep value
+            if var_name.endswith('|charge_state') and 'cluster' in da.dims:
+                # Get extra timestep from last cluster using vectorized selection
+                cluster_order = cluster_structure.cluster_order  # (n_original_clusters,) or with period/scenario
+                if cluster_order.ndim == 1:
+                    last_cluster = int(cluster_order[last_original_cluster_idx])
+                    extra_val = da.isel(cluster=last_cluster, time=-1)
+                else:
+                    # Multi-dimensional: select last cluster for each period/scenario slice
+                    last_clusters = cluster_order.isel(original_cluster=last_original_cluster_idx)
+                    extra_val = da.isel(cluster=last_clusters, time=-1)
+                # Drop 'cluster'/'time' coords created by isel (kept as non-dim coords)
+                extra_val = extra_val.drop_vars(['cluster', 'time'], errors='ignore')
+                extra_val = extra_val.expand_dims(time=[original_timesteps_extra[-1]])
+                expanded = xr.concat([expanded, extra_val], dim='time')
+
+            return expanded


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's understand the context around line 1257 cd flixopt && head -1280 transform_accessor.py | tail -100 | cat -n

Repository: flixOpt/flixopt

Length of output: 5859

🏁 Script executed:

# Search for timesteps_per_cluster usage and validation rg -n "timesteps_per_cluster" --type py -B 3 -A 3

Repository: flixOpt/flixopt

Length of output: 38250

🏁 Script executed:

# Look for clustering initialization and validation rg -n "n_original_timesteps" --type py -B 2 -A 2 | head -50

Repository: flixOpt/flixopt

Length of output: 4320

🏁 Script executed:

# Check how cluster_structure is created and validated rg -n "class ClusterResult\|n_original_clusters" --type py -B 2 -A 5

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Get the exact usage of last_original_cluster_idx at lines 1268-1275 sed -n '1257,1280p' flixopt/transform_accessor.py | cat -n

Repository: flixOpt/flixopt

Length of output: 1745

🏁 Script executed:

# Check cluster_order indexing bounds - what is its size? rg -n "cluster_order\[" --type py -B 2 -A 2 | head -40

Repository: flixOpt/flixopt

Length of output: 2288

🏁 Script executed:

# Look for validation that ensures n_original_timesteps is properly divisible rg -n "n_original_timesteps.*%\|%.*timesteps_per_cluster\|divisib" --type py

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Check if there are edge case tests with partial clusters rg -n "partial\|incomplete\|remainder" tests/ --type py | head -20

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Check if n_original_clusters is calculated or derived from n_original_timesteps rg -n "n_original_clusters.*=" flixopt/clustering/base.py | head -10

Repository: flixOpt/flixopt

Length of output: 206

🏁 Script executed:

# Look at the cluster() method to see if it validates perfect divisibility sed -n '650,750p' flixopt/transform_accessor.py | grep -n "cluster\|timestep"

Repository: flixOpt/flixopt

Length of output: 2212

🏁 Script executed:

# Check what original_timesteps contains and how n_original_timesteps relates to it rg -n "original_timesteps.*=" flixopt/transform_accessor.py | head -15

Repository: flixOpt/flixopt

Length of output: 777

🏁 Script executed:

# Search for any bounds checking or clamping around cluster_order access rg -n "cluster_order\[.*_idx\]\|isel.*original_cluster.*idx" --type py -B 1 -A 1

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Check how n_original_clusters is used and whether it can mismatch with actual cluster_order size rg -n "n_original_clusters" flixopt/clustering/base.py | head -20

Repository: flixOpt/flixopt

Length of output: 1003

🏁 Script executed:

# See the full context of how n_original_clusters is determined during clustering sed -n '1115,1165p' flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 1877

🏁 Script executed:

# Check what create_cluster_structure_from_mapping does with incomplete timesteps sed -n '1115,1170p' flixopt/clustering/base.py | cat -n

Repository: flixOpt/flixopt

Length of output: 2453

🏁 Script executed:

# Verify: does cluster_order always have exactly n_original_clusters entries? python3 << 'EOF' # Simulate the scenario from the review n_original_timesteps = 169 # Not divisible by 24 timesteps_per_cluster = 24 last_original_cluster_idx = (n_original_timesteps - 1) // timesteps_per_cluster n_original_clusters = n_original_timesteps // timesteps_per_cluster print(f"n_original_timesteps: {n_original_timesteps}") print(f"timesteps_per_cluster: {timesteps_per_cluster}") print(f"n_original_clusters: {n_original_clusters}") print(f"last_original_cluster_idx: {last_original_cluster_idx}") print(f"Valid cluster_order indices: 0-{n_original_clusters - 1}") print(f"Accessing cluster_order[{last_original_cluster_idx}] would be: ", end="") if last_original_cluster_idx >= n_original_clusters: print(f"OUT OF BOUNDS! (index {last_original_cluster_idx} >= size {n_original_clusters})") else: print("OK") EOF

Repository: flixOpt/flixopt

Length of output: 267

🏁 Script executed:

# Check if the FlowSystem or clustering validates divisibility rg -n "n_original_timesteps.*%\|n_original_timesteps.*//\|assert.*divisible" flixopt/ --type py

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Check the cluster() method for any validation or handling of partial clusters sed -n '600,950p' flixopt/transform_accessor.py | grep -n "n_original\|divisib\|partial\|remainder" | head -20

Repository: flixOpt/flixopt

Length of output: 254

🏁 Script executed:

# Look specifically at how cluster_order is sized in cluster() vs expand_solution() rg -n "cluster_order.*=" flixopt/transform_accessor.py -A 2 | head -40

Repository: flixOpt/flixopt

Length of output: 2035

🏁 Script executed:

# Check if there's any documentation or code comment about cluster boundary conditions rg -n "last.*cluster\|partial.*period\|remainder" flixopt/ --type py -B 1 -A 1

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Check if n_original_clusters is calculated with any remainder handling rg -n "n_original_clusters.*=" flixopt/transform_accessor.py -B 2 -A 2

Repository: flixOpt/flixopt

Length of output: 366

🏁 Script executed:

# Look for any comment or code about handling partial/incomplete clusters rg -n "incomplete\|partial\|remainder\|not.*divisible" flixopt/ --type py

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Check the full cluster() method to see if it enforces or validates divisibility sed -n '575,950p' flixopt/transform_accessor.py | tail -200 | head -150

Repository: flixOpt/flixopt

Length of output: 7914

🏁 Script executed:

# Verify the actual size of cluster_order that's created in cluster() python3 << 'EOF' # From cluster() at lines 950-967, cluster_order is created from: # cluster_orders[key] which comes from tsam_agg.clusterOrder # The size depends on how many original_cluster periods there are # In expand_solution() at line 1253: # n_original_clusters = cluster_structure.n_original_clusters # This is calculated from cluster_order's size # If cluster_order has n_original_clusters entries (indices 0 to n_original_clusters-1) # And we try to access cluster_order[last_original_cluster_idx] # where last_original_cluster_idx = (n_original_timesteps - 1) // timesteps_per_cluster # Then if n_original_timesteps is not perfectly divisible: # last_original_cluster_idx can equal n_original_clusters (out of bounds) print("The issue occurs when:") print(" n_original_timesteps % timesteps_per_cluster != 0") print() print("Example:") print(" n_original_timesteps = 169") print(" timesteps_per_cluster = 24") print(" n_original_clusters = 169 // 24 = 7 (cluster_order size)") print(" last_original_cluster_idx = (169-1) // 24 = 7 (INDEX OUT OF BOUNDS)") EOF

Repository: flixOpt/flixopt

Length of output: 330

🏁 Script executed:

# Check if there are any tests with non-divisible timesteps rg -n "test.*expand\|test.*cluster" tests/ --type py | grep -i test

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Look for test fixtures that might use non-standard timesteps rg -n "8_days\|7_days\|169\|partial" tests/ --type py | head -20

Repository: flixOpt/flixopt

Length of output: 41

🏁 Script executed:

# Check the simple_system_8_days fixture to see timesteps rg -n "simple_system_8_days" tests/ -A 10 | head -30

Repository: flixOpt/flixopt

Length of output: 2383

🏁 Script executed:

# Verify if there's protection/clamping at line 1273 (multi-dim case) sed -n '1267,1280p' flixopt/transform_accessor.py | cat -n

Repository: flixOpt/flixopt

Length of output: 1113

🏁 Script executed:

# Check if cluster_order can have fewer entries than n_original_clusters suggests python3 << 'EOF' # Looking at the code flow: # 1. cluster_orders[key] = tsam_agg.clusterOrder # This is the cluster order from TSAM - one value per original period # # 2. cluster_order_da created at line 952-953: # xr.DataArray(cluster_orders[key], dims=['original_cluster'], name='cluster_order') # # 3. In expand_solution() at line 1253: # n_original_clusters = cluster_structure.n_original_clusters # # 4. cluster_structure.n_original_clusters is a property that returns: # len(cluster_order) (the size of the original_cluster dimension) # # So if cluster_order has 7 entries, n_original_clusters = 7 # Valid indices: 0-6 # last_original_cluster_idx = (169-1) // 24 = 7 → OUT OF BOUNDS print("CONFIRMED BUG:") print("cluster_order.shape[0] = n_original_clusters") print("cluster_order[last_original_cluster_idx] will fail if") print(" last_original_cluster_idx >= n_original_clusters") print() print("This happens when:") print(" n_original_timesteps % timesteps_per_cluster != 0") EOF

Repository: flixOpt/flixopt

Length of output: 295

Fix out-of-bounds access in last_original_cluster_idx calculation.

Line 1257 calculates the last original cluster index without bounds checking:

last_original_cluster_idx = (n_original_timesteps - 1) // timesteps_per_cluster

When n_original_timesteps % timesteps_per_cluster != 0 (partial clusters at the end), this produces an index equal to or exceeding n_original_clusters, causing IndexError at lines 1269 and 1273 when accessing cluster_order[last_original_cluster_idx].

Clamp the index to valid bounds:

last_original_cluster_idx = min( (n_original_timesteps - 1) // timesteps_per_cluster, n_original_clusters - 1 )

🤖 Prompt for AI Agents

In flixopt/transform_accessor.py around lines 1253 to 1280, the computation of last_original_cluster_idx can exceed the valid range when the final cluster is partial, leading to out-of-bounds access of cluster_order; clamp the computed index to n_original_clusters - 1 (i.e., take the min of the current formula and n_original_clusters - 1) so subsequent indexing at lines 1269 and 1273 cannot raise IndexError.

1. Cell 32 (API Reference table): Updated defaults to 'hierarchical', 'medoidRepresentation', and None 2. Cell 16: Swapped the example to show k_means as the alternative (since hierarchical is now default) 3. Cell 17: Updated variable names to match 4. Cell 33 (Key Takeaways): Clarified that random_state is only needed for non-deterministic methods like 'k_means' The code review

…arning log and empty DataFrame fallback, plus handling empty DataFrames when building the metrics Dataset 2. Random state to tsam - Replaced global np.random.seed() with passing seed parameter directly to tsam's TimeSeriesAggregation 3. tsam_kwargs conflict validation - Added validation that raises ValueError if user tries to override explicit parameters via **tsam_kwargs (including seed) 4. predef_cluster_order validation - Added dimension validation for DataArray inputs, checking they match the FlowSystem's period/scenario structure 5. Out-of-bounds fix - Clamped last_original_cluster_idx to n_original_clusters - 1 to handle partial clusters at the end

…ey) or next(...) to explicit if metrics_df is None: check 2. removed random state

# Conflicts: # docs/notebooks/08c-clustering.ipynb # flixopt/dataset_plot_accessor.py

…e/tsam-params

# Conflicts: # .github/workflows/docs.yaml # mkdocs.yml

…mmary of the changes made: Changes Made 1. flixopt/config.py - Replaced three separate config attributes (extra_dim_priority, dim_slot_priority, x_dim_priority) with a single unified dim_priority tuple - Updated CONFIG.Plotting class docstring and attribute definitions - Updated to_dict() method to use the new attribute - The new priority order: ('time', 'duration', 'duration_pct', 'variable', 'cluster', 'period', 'scenario') 2. flixopt/dataset_plot_accessor.py - Created new assign_slots() function that centralizes all dimension-to-slot assignment logic - Fixed slot fill order: x → color → facet_col → facet_row → animation_frame - Updated all plot methods (bar, stacked_bar, line, area, heatmap, scatter, pie) to use assign_slots() - Removed old _get_x_dim() and _resolve_auto_facets() functions - Updated docstrings to reference dim_priority instead of x_dim_priority 3. flixopt/statistics_accessor.py - Updated _resolve_auto_facets() to use the new assign_slots() function internally - Added import for assign_slots from dataset_plot_accessor Key Design Decisions - Single priority list controls all auto-assignment - Slots are filled in fixed order based on availability - None means a slot is not available for that plot type - 'auto' triggers auto-assignment from priority list - Explicit string values override auto-assignment

FBumann added 7 commits January 3, 2026 23:29

Improve documentation and improve CHANGELOG.md

85911e8

FIx CHangelog and change to v6.0.0

e4cd270

FIx CHangelog and change to v6.0.0

c20f94f

FIx CHangelog and change to v6.0.0

c6c9a75

Dimension renamed: original_period → original_cluster

0abdb00

Property renamed: n_original_periods → n_original_clusters

FBumann added 7 commits January 4, 2026 13:18

Add variable and color to auto resolving in fxplot

57c9cb1

Improve plotting, especially for clustering

829c342

Drop cluster index when expanding

ed80f89

Fix storage expansion

3dc7eec

Improve clustering

6b0579f

coderabbitai bot reviewed Jan 4, 2026

View reviewed changes

FBumann added 12 commits January 4, 2026 19:45

fix scatter plot faceting

b2539d8

1. DataFrame truth ambiguity - Changed non_empty_metrics.get(first_k…

c126115

…ey) or next(...) to explicit if metrics_df is None: check 2. removed random state

Fix pie plot animation frame and add warnings for unassigned dims

1488721

Merge branch 'feature/aggregate-rework-v2' into feature/tsam-params

ee788c9

# Conflicts: # docs/notebooks/08c-clustering.ipynb # flixopt/dataset_plot_accessor.py

Merge branch 'feature/aggregate-rework-v2' into feature/tsam-params

cf0dcfa

# Conflicts: # docs/notebooks/08c-clustering.ipynb # flixopt/dataset_plot_accessor.py

Merge remote-tracking branch 'origin/feature/tsam-params' into featur…

329941d

…e/tsam-params

Merge branch 'feature/aggregate-rework-v2' into feature/tsam-params

96e2f32

# Conflicts: # .github/workflows/docs.yaml # mkdocs.yml

Change logger warning to regular warning

e18966b

Add slot_order to config

947ccd9

FBumann added 9 commits January 5, 2026 12:20

Add new assign_slots() method

b1336f6

Add new assign_slots() method

28bb631

Fix heatmap and convert all to use fxplot

4f8407a

Fix heatmap

a4d4681

Fix heatmap

ae5655d

Fix heatmap

56b1838

Fix heatmap

56719e8

Merge branch 'feature/aggregate-rework-v2' into feature/tsam-params

4e7810d

Squeeze signleton dims in heatmap()

c6da15f

FBumann changed the title ~~Feature/tsam params~~ Clustering Parameter Refinements & Unified Slot Assignment Jan 5, 2026

FBumann merged commit c94f6da into feature/aggregate-rework-v2 Jan 5, 2026
10 checks passed

This was referenced Jan 5, 2026

Feature/aggregation final #554

Merged

feat: clustering of time series data #555

Merged

This was referenced Jan 6, 2026

Feature/minor improvements #556

Merged

Feature/clustering segmentation #559

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clustering Parameter Refinements & Unified Slot Assignment #552

Clustering Parameter Refinements & Unified Slot Assignment #552

FBumann commented Jan 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

FBumann commented Jan 4, 2026

Uh oh!

coderabbitai bot commented Jan 4, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 4, 2026

Uh oh!

coderabbitai bot Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Clustering Parameter Refinements & Unified Slot Assignment #552

Clustering Parameter Refinements & Unified Slot Assignment #552

Conversation

FBumann commented Jan 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Testing

Checklist

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

FBumann commented Jan 4, 2026

Uh oh!

coderabbitai bot commented Jan 4, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FBumann commented Jan 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 4, 2026 •

edited

Loading