Skip to content

Conversation

@FBumann
Copy link
Member

@FBumann FBumann commented Jan 4, 2026

Description

Brief description of the changes in this PR.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Code refactoring

Related Issues

Closes #(issue number)

Testing

  • I have tested my changes
  • Existing tests still pass

Checklist

  • My code follows the project style
  • I have updated documentation if needed
  • I have added tests for new functionality (if applicable)

Summary by CodeRabbit

Release Notes

New Features

  • Expanded clustering capabilities with advanced configuration options (multiple clustering methods, reproducibility controls, and quality metrics).
  • New FXPlot accessor enables plotting custom xarray datasets beyond simulation results.
  • Inter-cluster storage linking support.

Breaking Changes

  • Clustering API terminology and weight normalization updated.
  • Deprecated v5.0 classes removed; consult migration guide.
  • Plotting method defaults adjusted for improved dimension handling.

Documentation

  • Comprehensive migration guides and advanced clustering examples added.
  • Updated documentation reflecting new API parameters and capabilities.

✏️ Tip: You can customize this high-level summary in your review settings.

  New Parameters Added to cluster() Method

  | Parameter               | Type                          | Default              | Purpose                                                                                                            |
  |-------------------------|-------------------------------|----------------------|--------------------------------------------------------------------------------------------------------------------|
  | cluster_method          | Literal[...]                  | 'k_means'            | Clustering algorithm ('k_means', 'hierarchical', 'k_medoids', 'k_maxoids', 'averaging')                            |
  | representation_method   | Literal[...]                  | 'meanRepresentation' | How clusters are represented ('meanRepresentation', 'medoidRepresentation', 'distributionAndMinMaxRepresentation') |
  | extreme_period_method   | Literal[...]                  | 'new_cluster_center' | How peaks are integrated ('None', 'append', 'new_cluster_center', 'replace_cluster_center')                        |
  | rescale_cluster_periods | bool                          | True                 | Rescale clusters to match original means                                                                           |
  | random_state            | int | None                    | None                 | Random seed for reproducibility                                                                                    |
  | predef_cluster_order    | np.ndarray | list[int] | None | None                 | Manual clustering assignments                                                                                      |
  | **tsam_kwargs           | Any                           | -                    | Pass-through for any tsam parameter                                                                                |

  Clustering Quality Metrics

  Access via fs.clustering.metrics after clustering - returns a DataFrame with RMSE, MAE, and other accuracy indicators per time series.

  Files Modified

  1. flixopt/transform_accessor.py - Updated cluster() signature and tsam call
  2. flixopt/clustering/base.py - Added metrics field to Clustering class
  3. tests/test_clustering/test_integration.py - Added tests for new parameters
  4. docs/user-guide/optimization/clustering.md - Updated documentation
  Property renamed: n_original_periods → n_original_clusters
…imestep that regular FlowSystems have.

  Root Cause: In expand_solution(), the solution was only indexed by original_timesteps (n elements) instead of original_timesteps_extra (n+1 elements).

  Fix in flixopt/transform_accessor.py:

  1. Reindex solution to timesteps_extra (line 1296-1298):
    - Added expanded_fs._solution.reindex(time=original_timesteps_extra) for consistency with non-expanded FlowSystems
  2. Fill extra timestep for charge_state (lines 1300-1333):
    - Added special handling to properly fill the extra timestep for storage charge_state variables using the last cluster's extra timestep value
  3. Updated intercluster storage handling (lines 1340-1388):
    - Modified to work with original_timesteps_extra instead of just original_timesteps
    - The extra timestep now correctly gets the final SOC boundary value with proper decay applied

  Tests updated in tests/test_cluster_reduce_expand.py:
  - Updated 4 assertions that check solution time coordinates to expect 193 (192 + 1 extra) instead of 192
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

📝 Walkthrough

Walkthrough

This PR releases version 6.0.0 introducing major clustering enhancements, a new fxplot plotting accessor, refined plotting configuration with color as a first-class dimension, and terminology standardization (original_period → original_cluster). It removes deprecated v5.0 classes, extends the clustering API with new parameters (cluster_method, representation_method, random_state, rescale_cluster_periods, extreme_period_method, predef_cluster_order), adds clustering quality metrics, and updates documentation, notebooks, and tests accordingly.

Changes

Cohort / File(s) Summary
Release & Configuration
CHANGELOG.md, flixopt/config.py
Added v6.0.0 release notes documenting new clustering features, fxplot accessor, breaking changes, and deprecations. Updated deprecation removal version from 6.0.0 to 7.0.0 and expanded plotting config priorities to include 'variable' and 'color' as first-class dimensions.
Clustering Core & Terminology
flixopt/clustering/base.py, flixopt/clustering/intercluster_helpers.py, flixopt/components.py
Renamed "original_period" terminology to "original_cluster" across ClusterStructure, Clustering, and component classes; added public metrics attribute (xr.Dataset | None); updated n_original_periods property to n_original_clusters; adjusted inter-cluster linking logic and boundary coordinate calculations.
Clustering API & Execution
flixopt/transform_accessor.py
Extended cluster() method signature with new parameters: cluster_method, representation_method, extreme_period_method, rescale_cluster_periods, random_state, predef_cluster_order, and **tsam_kwargs. Added multi-dimensional metrics aggregation, improved expand_solution() with extra timestep and original_cluster handling, and per-period/scenario cluster configuration support.
Plotting Infrastructure & Color Handling
flixopt/dataset_plot_accessor.py
Introduced color as a first-class plotting dimension across bar, stacked_bar, line, area, and related methods via new color parameter (str | 'auto' | None). Updated _get_x_dim() and _resolve_auto_facets() signatures to handle color resolution independently; adjusted facet and color mapping logic.
Documentation & Notebooks
docs/user-guide/optimization/clustering.md, docs/user-guide/results-plotting.md, docs/notebooks/08c-clustering.ipynb
Added clustering "Advanced Clustering Options" and "Clustering Quality Metrics" sections with code examples; documented new cluster() parameters and their defaults. Added plotting tip for custom xarray data via fxplot accessor. Updated notebook with new clustering parameters, reproducible random state, and fxplot-based visualizations.
Documentation Navigation
mkdocs.yml
Restructured Scaling section: renamed "Clustering" (resampling) item to "Resampling"; added new "Clustering" submenu with four entries (Basics, Storage Modes, Multi-Period, Internals).
Tests
tests/test_cluster_reduce_expand.py, tests/test_clustering/test_base.py, tests/test_clustering/test_integration.py
Updated assertions to expect extra timestep (193 vs 192) in expanded solutions; renamed dimension from 'original_period' to 'original_cluster'; replaced n_original_periods references with n_original_clusters; added comprehensive TestClusterAdvancedOptions suite validating new clustering parameters, metrics, and reproducibility.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Clusters bloom where periods once were,
New methods whisper which algorithm to prefer,
Colors dance as first-class citizens now,
Metrics shine with their how and their how,
Six-point-oh hops into the future! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is a blank template with no actual content filled in; all sections are empty or unchecked, failing to describe the changes, type, related issues, testing status, or rationale. Fill in all required sections with substantive content describing the changes, type of change, related issues, testing verification, and update the checklist to accurately reflect work completed.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 94.64% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title addresses clustering parameter additions and plotting slot assignment, which are major themes in the PR, but is somewhat generic and doesn't capture the full scope of changes including terminology refactoring, storage expansion, and documentation updates.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…sts in the melted DataFrame from data_var names, not as a dimension)

  - When facet_row='variable' or facet_col='variable' is passed, it's passed through directly
  - In line(), when faceting by variable, it's not also used for color (avoids double encoding)
…at it consistently:

  flixopt/config.py:
  'extra_dim_priority': ('variable', 'cluster', 'period', 'scenario'),
  'x_dim_priority': ('time', 'duration', 'duration_pct', 'variable', 'period', 'scenario', 'cluster'),

  flixopt/dataset_plot_accessor.py:
  - _get_x_dim: Now takes n_data_vars parameter; 'variable' is available when > 1
  - _resolve_auto_facets: 'variable' is available when len(data_vars) > 1 and respects exclude_dims

  Behavior:
  - 'variable' is treated like any other dimension in the priority system
  - Only available when there are multiple data_vars
  - Properly excluded when already used (e.g., for x-axis)
@FBumann
Copy link
Member Author

FBumann commented Jan 4, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
docs/notebooks/08c-clustering.ipynb (1)

549-590: Critical: Documentation defaults don't match implementation.

The API reference table shows incorrect default values for several parameters:

  1. Line 556: Documents cluster_method default as 'k_means', but the implementation (transform_accessor.py line 585) defaults to 'hierarchical'
  2. Line 557: Documents representation_method default as 'meanRepresentation', but the implementation (line 588) defaults to 'medoidRepresentation'
  3. Line 558: Documents extreme_period_method default as 'new_cluster_center', but the implementation (line 589) defaults to None
🔎 Proposed corrections for the API reference table
-| `cluster_method` | `str` | 'k_means' | Algorithm: 'k_means', 'hierarchical', 'k_medoids', 'k_maxoids', 'averaging' |
+| `cluster_method` | `str` | 'hierarchical' | Algorithm: 'k_means', 'hierarchical', 'k_medoids', 'k_maxoids', 'averaging' |
-| `representation_method` | `str` | 'meanRepresentation' | 'meanRepresentation', 'medoidRepresentation', 'distributionAndMinMaxRepresentation' |
+| `representation_method` | `str` | 'medoidRepresentation' | 'meanRepresentation', 'medoidRepresentation', 'distributionAndMinMaxRepresentation' |
-| `extreme_period_method` | `str` | 'new_cluster_center' | How peaks are integrated: 'None', 'append', 'new_cluster_center', 'replace_cluster_center' |
+| `extreme_period_method` | `str` | None | How peaks are integrated: None, 'append', 'new_cluster_center', 'replace_cluster_center' |
flixopt/transform_accessor.py (1)

720-807: Add error handling for clustering metrics computation.

The code assumes tsam_agg.accuracyIndicators() always succeeds and returns a valid DataFrame. If tsam encounters issues computing metrics, this could raise an exception.

Consider adding defensive error handling:

try:
    clustering_metrics_all[key] = tsam_agg.accuracyIndicators()
except Exception as e:
    logger.warning(f'Failed to compute clustering metrics for {key}: {e}')
    clustering_metrics_all[key] = pd.DataFrame()  # Empty DataFrame as fallback

Then handle the case where all metrics are empty when building the Dataset.

🧹 Nitpick comments (2)
flixopt/transform_accessor.py (2)

715-717: Consider passing random_state to tsam instead of setting global state.

Setting np.random.seed() globally can affect other parts of the code that use numpy's random functions, potentially causing hard-to-debug issues.

If tsam's TimeSeriesAggregation accepts a random_state or seed parameter, consider passing it directly rather than modifying global state:

# Instead of:
if random_state is not None:
    np.random.seed(random_state)

# Consider:
if random_state is not None:
    tsam_kwargs.setdefault('randomSeed', random_state)

Verify tsam's API to see if this is supported. If not, the current approach is acceptable but should be documented.


746-760: Document tsam_kwargs precedence to prevent conflicts.

Users could accidentally override explicit parameters via **tsam_kwargs, leading to confusing behavior. For example:

cluster(..., cluster_method='hierarchical', **{'clusterMethod': 'k_means'})

Consider adding a docstring note or validation:

# Ensure tsam_kwargs doesn't override explicit parameters
reserved_keys = {'clusterMethod', 'extremePeriodMethod', 'representationMethod', 
                 'rescaleClusterPeriods', 'predefClusterOrder'}
conflicts = reserved_keys & set(tsam_kwargs.keys())
if conflicts:
    raise ValueError(
        f'Cannot override explicit parameters via tsam_kwargs: {conflicts}'
    )

Alternatively, just document the precedence in the docstring.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf6962c and 6b0579f.

📒 Files selected for processing (14)
  • CHANGELOG.md
  • docs/notebooks/08c-clustering.ipynb
  • docs/user-guide/optimization/clustering.md
  • docs/user-guide/results-plotting.md
  • flixopt/clustering/base.py
  • flixopt/clustering/intercluster_helpers.py
  • flixopt/components.py
  • flixopt/config.py
  • flixopt/dataset_plot_accessor.py
  • flixopt/transform_accessor.py
  • mkdocs.yml
  • tests/test_cluster_reduce_expand.py
  • tests/test_clustering/test_base.py
  • tests/test_clustering/test_integration.py
🧰 Additional context used
🧬 Code graph analysis (2)
flixopt/dataset_plot_accessor.py (2)
flixopt/config.py (2)
  • CONFIG (185-912)
  • Plotting (551-591)
flixopt/color_processing.py (1)
  • process_colors (165-233)
flixopt/components.py (2)
flixopt/clustering/base.py (3)
  • n_original_clusters (127-129)
  • n_original_clusters (1051-1055)
  • cluster_order (1029-1033)
flixopt/clustering/intercluster_helpers.py (1)
  • build_boundary_coords (134-178)
🪛 markdownlint-cli2 (0.18.1)
CHANGELOG.md

129-129: Code block style
Expected: indented; Actual: fenced

(MD046, code-block-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.14)
  • GitHub Check: Build documentation
🔇 Additional comments (39)
mkdocs.yml (1)

72-78: LGTM! Well-organized documentation restructuring.

The separation of "Resampling" and "Clustering" as distinct concepts is clear, and the new hierarchical structure for clustering documentation (Basics, Storage Modes, Multi-Period, Internals) provides logical progressive coverage.

docs/user-guide/results-plotting.md (1)

5-7: LGTM! Helpful cross-reference added.

The tip block effectively guides users to the generalized plotting pathway for arbitrary xarray data, complementing the flixopt-specific plotting documentation.

tests/test_cluster_reduce_expand.py (2)

452-454: LGTM! Terminology updated consistently.

The change from n_original_periods to n_original_clusters aligns with the broader refactoring across the codebase to use cluster-based terminology.


170-170: The extra timestep is correctly implemented and intentionally required for inter-cluster storage linking.

The 193-timestep structure (192 original + 1 extra) is fundamental to how storage constraints with SOC_boundary work. The calculate_timestep_duration() method correctly indexes to timesteps_extra[:-1], yielding 192 duration values that correspond to the 192 intervals between 193 time points. When flow_hours multiplies flow rates by these durations, xarray's broadcasting correctly produces NaN for the final timestep in non-storage variables while storage charge_state properly captures the boundary condition.

The expansion logic in expand_solution() (lines 1324–1382) correctly:

  1. Expands charge_state with the extra timestep for each cluster
  2. Combines it with SOC_boundary accounting for self-discharge decay using the formula: SOC(t) = SOC_boundary[d] * (1-loss)^t + ΔE(t)
  3. Reindexes all solution variables to timesteps_extra for consistency

The test at lines 176–210 confirms that total_effects and flow_hours match between clustered and expanded solutions, validating that all downstream statistics and operations handle the extra timestep structure correctly.

flixopt/clustering/intercluster_helpers.py (1)

134-167: LGTM! API updated to use consistent cluster terminology.

The function signature and implementation have been updated from n_original_periods to n_original_clusters, aligning with the broader terminology standardization across the clustering module. The logic and calculation remain correct. All callers—including the call at flixopt/components.py:1341—have been updated to use the new parameter name, and no references to the old parameter remain in the codebase.

tests/test_clustering/test_integration.py (1)

173-269: LGTM! Comprehensive test coverage for advanced clustering options.

The new TestClusterAdvancedOptions test class provides thorough coverage of the newly added clustering parameters including cluster_method, random_state, representation_method, rescale_cluster_periods, and tsam kwargs passthrough. The fixture setup is clean and the tests are well-structured.

tests/test_clustering/test_base.py (1)

20-20: LGTM! Consistent terminology update.

The renaming from original_period to original_cluster is applied systematically across all test cases. The dimension names and property assertions are updated consistently, reflecting the broader terminology standardization in the PR.

Also applies to: 32-32, 45-45, 50-50, 139-139

CHANGELOG.md (6)

54-59: Excellent release documentation structure.

The v6.0.0 header clearly marks this as a major release with a concise summary highlighting the three main features: time-series clustering, fxplot accessor, and removal of deprecated classes. The warning callout for breaking changes with migration guidance is particularly helpful.


75-111: Well-documented advanced clustering configuration.

The new clustering parameters are clearly explained with:

  • Clean parameter descriptions in the initial table
  • Practical code example showing combined usage
  • Comprehensive algorithm comparison table
  • Guidance on using **kwargs for advanced tsam parameters

This provides excellent user guidance for the extended clustering API.


112-126: Clear documentation of clustering quality metrics.

The metrics section effectively demonstrates:

  • How to access the metrics xr.Dataset
  • Expected structure with dimensions
  • Specific metric extraction pattern

This matches the clustering API extensions referenced in the PR objectives.


127-163: Comprehensive fxplot accessor documentation.

The FXPlot section provides:

  • Multiple practical examples showing the API
  • Complete method reference table
  • Key features breakdown
  • Clear integration points with any xarray Dataset

The code examples are clean and demonstrate the universal nature of the accessor.

Note: The static analysis warning about "fenced" vs "indented" code blocks at line 129 is a false positive—fenced code blocks with syntax highlighting are the recommended Markdown style.


176-209: Thorough deprecation notices for v7.0.0.

The deprecation section is well-organized:

  • Clear removal version (v7.0.0) at the top
  • Categorized by type (Classes, FlowSystem methods, Parameters, Topology methods)
  • Includes migration mappings in table format
  • Notes about simplified topology method names

This gives users clear advance notice for the next major version.


210-245: Helpful migration guide with side-by-side comparison.

The ClusteredOptimization migration section:

  • Uses tabs to show old vs. new code clearly
  • Includes parameter mapping (nr_of_periods → n_clusters)
  • Demonstrates the new two-stage workflow
  • Shows how to access results in both approaches

This makes upgrading straightforward for existing users.

docs/user-guide/optimization/clustering.md (3)

55-58: Well-documented clustering parameter additions.

The table extension clearly documents the four new parameters:

  • cluster_method: Algorithm selection with examples
  • representation_method: Representation strategy
  • random_state: Reproducibility control
  • rescale_cluster_periods: Period rescaling option

These align with the advanced clustering features mentioned in the PR objectives.


75-111: Comprehensive advanced clustering options guide.

The new section provides:

  • Practical example combining multiple advanced parameters
  • Complete algorithm comparison table (k_means, hierarchical, k_medoids, k_maxoids, averaging)
  • Clear guidance on **kwargs for tsam parameters (sameMean, sortValues)

This gives users fine-grained control over clustering behavior with good examples.


112-126: Clear clustering quality metrics documentation.

The metrics section demonstrates:

  • How to access fs_clustered.clustering.metrics
  • Expected xr.Dataset structure
  • Specific metric extraction (RMSE example)

This exposes the clustering quality evaluation capabilities added in this PR.

flixopt/config.py (2)

33-33: Correct deprecation version progression.

Updating DEPRECATION_REMOVAL_VERSION from '6.0.0' to '7.0.0' is appropriate—this release is v6.0.0, so deprecations should target the next major version (v7.0.0). This aligns with the extensive deprecation notices in the CHANGELOG.


167-169: Plotting priority updates for fxplot accessor.

The configuration changes introduce new dimension priorities:

  • 'variable' added as first priority in extra_dim_priority
  • 'color' added as first slot in dim_slot_priority
  • 'variable' inserted in x_dim_priority

These changes support the new fxplot accessor feature documented in the CHANGELOG, where color is now a "first-class dimension" and variable is treated as a valid facet/color option. Since this is a major version release (6.0.0), these breaking changes in default behavior are acceptable and well-documented.

flixopt/components.py (3)

1198-1198: Consistent terminology update in InterclusterStorageModel.

The renaming from n_original_periods to n_original_clusters is systematically applied throughout the inter-cluster storage linking logic:

  • Variable names updated (line 1333)
  • Function call parameters updated (line 1341)
  • Index selections updated (lines 1370, 1380-1381)
  • Comments updated (line 1198)

The changes align with the broader PR terminology standardization and match the updates in related files (clustering/base.py, clustering/intercluster_helpers.py).

Also applies to: 1333-1333, 1341-1341, 1370-1370, 1380-1381


1363-1365: Method signatures correctly updated.

The function signatures for _add_linking_constraints and _add_combined_bound_constraints have been updated:

  • Parameter renamed from n_original_periods to n_original_clusters
  • All call sites updated to pass the correctly named variable
  • Docstring parameters updated to match

This maintains API consistency across the storage linking implementation.

Also applies to: 1395-1395, 1444-1445, 1461-1462, 1490-1491


1468-1471: Dimension renaming with proper coordinate reassignment.

The dimension renames from 'cluster_boundary' to 'original_cluster' are correctly implemented:

  • Lines 1468-1471: Rename dimension, then assign coordinates using np.arange(n_original_clusters)
  • Lines 1517-1518: Similar pattern for soc_d
  • Lines 1530-1535: Rename 'cluster' to 'original_cluster' after reordering via cluster_order

The coordinate reassignment after dimension renaming ensures proper alignment and avoids index/coordinate mismatches.

Also applies to: 1517-1518, 1530-1535

docs/notebooks/08c-clustering.ipynb (5)

32-32: LGTM! Modern xarray-based plotting approach.

The addition of xarray import and dataset-based visualization aligns well with the new fxplot API, providing cleaner and more maintainable plotting code.

Also applies to: 74-80


150-159: Excellent addition for reproducibility.

Setting random_state=42 ensures consistent, reproducible results in the tutorial, which is a best practice for documentation and examples.


189-241: Clear demonstration of clustering metadata API.

The progressive disclosure of clustering metadata (properties → metrics → visualization) provides an excellent learning path for users.


247-298: Good demonstration of algorithm comparison workflow.

The examples effectively show how to compare different clustering algorithms using both cluster assignments and quality metrics. The RMSE comparison is particularly useful for understanding clustering quality trade-offs.


305-342: Valuable workflow for design comparisons.

The manual cluster assignment example addresses a real analysis need: ensuring consistent cluster structure across design variants for fair comparison. Well explained and implemented.

flixopt/transform_accessor.py (3)

585-641: Well-designed API extension with comprehensive documentation.

The new clustering parameters provide valuable control over the clustering process while maintaining good defaults. The docstring is thorough and includes helpful examples. The choice of 'hierarchical' as the default is appropriate since it's deterministic.


941-977: LGTM! Consistent terminology update.

The renaming from original_period to original_cluster improves clarity by avoiding confusion with the model's "period" dimension (years/months). The change is consistently applied across all code paths.


1324-1390: Complex but correct inter-cluster storage handling.

The SOC_boundary integration logic properly handles the extra timestep and applies self-discharge decay based on time within each period. The vectorized operations with DataArray indexing are efficient.

The cleanup at lines 1384-1390 is good housekeeping, removing internal variables that shouldn't be exposed in the expanded solution.

flixopt/dataset_plot_accessor.py (5)

16-38: LGTM! Clean extension for multi-variable support.

Making 'variable' available as an x-axis option when there are multiple data variables is intuitive and maintains backward compatibility with the default n_data_vars=1.


41-96: Well-structured extension of auto-resolution logic.

The addition of color to the facet resolution system is clean and follows the existing pattern. The function signature change to return a 4-tuple is properly handled by all call sites in the file.


147-223: Robust color handling with proper defaults.

The color resolution logic correctly handles edge cases (single variable, no color dimension) and uses appropriate fallbacks. The defensive check for 'color' not in px_kwargs prevents accidental overrides.


225-469: Consistent color support across plotting methods.

All four methods (bar, stacked_bar, line, area) implement color handling consistently, maintaining the same API patterns and defensive coding practices.


471-684: Correct integration with color-unaware plot types.

Heatmap, scatter, and pie methods correctly pass None for the color parameter since these plot types don't use the color dimension for categorical grouping. The API remains clean and consistent.

flixopt/clustering/base.py (5)

34-130: Consistent and clear terminology update.

The renaming to original_cluster throughout ClusterStructure eliminates ambiguity with the model's "period" dimension and improves code clarity. The documentation and property names align well.


969-983: Valuable addition for clustering quality evaluation.

The metrics attribute provides users with quantitative measures of clustering quality (RMSE, MAE) which is essential for evaluating and comparing different clustering configurations. Making it optional (None) maintains flexibility for backends that don't compute metrics.


200-254: Modernized plotting with fxplot API.

The update to use fxplot.heatmap() provides better defaults and consistency with the rest of the plotting infrastructure. The visualization effectively shows cluster assignments.


578-699: Flexible and intuitive comparison visualization.

The addition of color and line_dash parameters provides users with multiple ways to distinguish between original and clustered data. The default of using line dash for representation (Original=dotted, Clustered=solid) is an effective visual distinction.


856-953: Excellent UX: cluster labels include occurrence counts.

Adding occurrence counts to cluster labels (e.g., "Cluster 0 (×45)") provides immediate insight into cluster weights without needing to reference separate data. This is a thoughtful UX improvement.

Comment on lines +733 to +741
# Handle predef_cluster_order for multi-dimensional case
predef_order_slice = None
if predef_cluster_order is not None:
if isinstance(predef_cluster_order, xr.DataArray):
# Extract slice for this (period, scenario) combination
predef_order_slice = predef_cluster_order.sel(**selector, drop=True).values
else:
# Simple array/list - use directly
predef_order_slice = predef_cluster_order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add validation for predef_cluster_order dimensions.

If predef_cluster_order is a DataArray with unexpected dimensions (e.g., missing 'period' when the FlowSystem has periods), the .sel() call could fail or produce incorrect results.

Consider adding dimension validation:

if predef_cluster_order is not None:
    if isinstance(predef_cluster_order, xr.DataArray):
        # Validate dimensions match FlowSystem structure
        expected_dims = {'original_cluster'}
        if has_periods:
            expected_dims.add('period')
        if has_scenarios:
            expected_dims.add('scenario')
        if set(predef_cluster_order.dims) != expected_dims:
            raise ValueError(
                f'predef_cluster_order dims {set(predef_cluster_order.dims)} '
                f'do not match expected {expected_dims}'
            )
        predef_order_slice = predef_cluster_order.sel(**selector, drop=True).values
    else:
        predef_order_slice = predef_cluster_order
🤖 Prompt for AI Agents
In flixopt/transform_accessor.py around lines 733 to 741, validate DataArray
dimensions of predef_cluster_order before using .sel(): compute expected_dims =
{'original_cluster'} and add 'period' if has_periods and 'scenario' if
has_scenarios, then if set(predef_cluster_order.dims) != expected_dims raise a
ValueError with a clear message showing actual vs expected dims; after the check
proceed to extract the slice with predef_cluster_order.sel(**selector,
drop=True).values, otherwise accept non-DataArray inputs unchanged.

Comment on lines 1253 to 1280
n_original_clusters = cluster_structure.n_original_clusters

# Expand function using ClusterResult.expand_data() - handles multi-dimensional cases
def expand_da(da: xr.DataArray) -> xr.DataArray:
# For charge_state with cluster dim, also includes the extra timestep
last_original_cluster_idx = (n_original_timesteps - 1) // timesteps_per_cluster

def expand_da(da: xr.DataArray, var_name: str = '') -> xr.DataArray:
if 'time' not in da.dims:
return da.copy()
return info.result.expand_data(da, original_time=original_timesteps)
expanded = info.result.expand_data(da, original_time=original_timesteps)

# For charge_state with cluster dim, append the extra timestep value
if var_name.endswith('|charge_state') and 'cluster' in da.dims:
# Get extra timestep from last cluster using vectorized selection
cluster_order = cluster_structure.cluster_order # (n_original_clusters,) or with period/scenario
if cluster_order.ndim == 1:
last_cluster = int(cluster_order[last_original_cluster_idx])
extra_val = da.isel(cluster=last_cluster, time=-1)
else:
# Multi-dimensional: select last cluster for each period/scenario slice
last_clusters = cluster_order.isel(original_cluster=last_original_cluster_idx)
extra_val = da.isel(cluster=last_clusters, time=-1)
# Drop 'cluster'/'time' coords created by isel (kept as non-dim coords)
extra_val = extra_val.drop_vars(['cluster', 'time'], errors='ignore')
extra_val = extra_val.expand_dims(time=[original_timesteps_extra[-1]])
expanded = xr.concat([expanded, extra_val], dim='time')

return expanded
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's understand the context around line 1257
cd flixopt && head -1280 transform_accessor.py | tail -100 | cat -n

Repository: flixOpt/flixopt

Length of output: 5859


🏁 Script executed:

# Search for timesteps_per_cluster usage and validation
rg -n "timesteps_per_cluster" --type py -B 3 -A 3

Repository: flixOpt/flixopt

Length of output: 38250


🏁 Script executed:

# Look for clustering initialization and validation
rg -n "n_original_timesteps" --type py -B 2 -A 2 | head -50

Repository: flixOpt/flixopt

Length of output: 4320


🏁 Script executed:

# Check how cluster_structure is created and validated
rg -n "class ClusterResult\|n_original_clusters" --type py -B 2 -A 5

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Get the exact usage of last_original_cluster_idx at lines 1268-1275
sed -n '1257,1280p' flixopt/transform_accessor.py | cat -n

Repository: flixOpt/flixopt

Length of output: 1745


🏁 Script executed:

# Check cluster_order indexing bounds - what is its size?
rg -n "cluster_order\[" --type py -B 2 -A 2 | head -40

Repository: flixOpt/flixopt

Length of output: 2288


🏁 Script executed:

# Look for validation that ensures n_original_timesteps is properly divisible
rg -n "n_original_timesteps.*%\|%.*timesteps_per_cluster\|divisib" --type py

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Check if there are edge case tests with partial clusters
rg -n "partial\|incomplete\|remainder" tests/ --type py | head -20

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Check if n_original_clusters is calculated or derived from n_original_timesteps
rg -n "n_original_clusters.*=" flixopt/clustering/base.py | head -10

Repository: flixOpt/flixopt

Length of output: 206


🏁 Script executed:

# Look at the cluster() method to see if it validates perfect divisibility
sed -n '650,750p' flixopt/transform_accessor.py | grep -n "cluster\|timestep"

Repository: flixOpt/flixopt

Length of output: 2212


🏁 Script executed:

# Check what original_timesteps contains and how n_original_timesteps relates to it
rg -n "original_timesteps.*=" flixopt/transform_accessor.py | head -15

Repository: flixOpt/flixopt

Length of output: 777


🏁 Script executed:

# Search for any bounds checking or clamping around cluster_order access
rg -n "cluster_order\[.*_idx\]\|isel.*original_cluster.*idx" --type py -B 1 -A 1

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Check how n_original_clusters is used and whether it can mismatch with actual cluster_order size
rg -n "n_original_clusters" flixopt/clustering/base.py | head -20

Repository: flixOpt/flixopt

Length of output: 1003


🏁 Script executed:

# See the full context of how n_original_clusters is determined during clustering
sed -n '1115,1165p' flixopt/clustering/base.py

Repository: flixOpt/flixopt

Length of output: 1877


🏁 Script executed:

# Check what create_cluster_structure_from_mapping does with incomplete timesteps
sed -n '1115,1170p' flixopt/clustering/base.py | cat -n

Repository: flixOpt/flixopt

Length of output: 2453


🏁 Script executed:

# Verify: does cluster_order always have exactly n_original_clusters entries?
python3 << 'EOF'
# Simulate the scenario from the review
n_original_timesteps = 169  # Not divisible by 24
timesteps_per_cluster = 24
last_original_cluster_idx = (n_original_timesteps - 1) // timesteps_per_cluster
n_original_clusters = n_original_timesteps // timesteps_per_cluster

print(f"n_original_timesteps: {n_original_timesteps}")
print(f"timesteps_per_cluster: {timesteps_per_cluster}")
print(f"n_original_clusters: {n_original_clusters}")
print(f"last_original_cluster_idx: {last_original_cluster_idx}")
print(f"Valid cluster_order indices: 0-{n_original_clusters - 1}")
print(f"Accessing cluster_order[{last_original_cluster_idx}] would be: ", end="")
if last_original_cluster_idx >= n_original_clusters:
    print(f"OUT OF BOUNDS! (index {last_original_cluster_idx} >= size {n_original_clusters})")
else:
    print("OK")
EOF

Repository: flixOpt/flixopt

Length of output: 267


🏁 Script executed:

# Check if the FlowSystem or clustering validates divisibility
rg -n "n_original_timesteps.*%\|n_original_timesteps.*//\|assert.*divisible" flixopt/ --type py

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Check the cluster() method for any validation or handling of partial clusters
sed -n '600,950p' flixopt/transform_accessor.py | grep -n "n_original\|divisib\|partial\|remainder" | head -20

Repository: flixOpt/flixopt

Length of output: 254


🏁 Script executed:

# Look specifically at how cluster_order is sized in cluster() vs expand_solution()
rg -n "cluster_order.*=" flixopt/transform_accessor.py -A 2 | head -40

Repository: flixOpt/flixopt

Length of output: 2035


🏁 Script executed:

# Check if there's any documentation or code comment about cluster boundary conditions
rg -n "last.*cluster\|partial.*period\|remainder" flixopt/ --type py -B 1 -A 1

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Check if n_original_clusters is calculated with any remainder handling
rg -n "n_original_clusters.*=" flixopt/transform_accessor.py -B 2 -A 2

Repository: flixOpt/flixopt

Length of output: 366


🏁 Script executed:

# Look for any comment or code about handling partial/incomplete clusters
rg -n "incomplete\|partial\|remainder\|not.*divisible" flixopt/ --type py

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Check the full cluster() method to see if it enforces or validates divisibility
sed -n '575,950p' flixopt/transform_accessor.py | tail -200 | head -150

Repository: flixOpt/flixopt

Length of output: 7914


🏁 Script executed:

# Verify the actual size of cluster_order that's created in cluster()
python3 << 'EOF'
# From cluster() at lines 950-967, cluster_order is created from:
# cluster_orders[key] which comes from tsam_agg.clusterOrder
# The size depends on how many original_cluster periods there are

# In expand_solution() at line 1253:
# n_original_clusters = cluster_structure.n_original_clusters

# This is calculated from cluster_order's size
# If cluster_order has n_original_clusters entries (indices 0 to n_original_clusters-1)
# And we try to access cluster_order[last_original_cluster_idx]
# where last_original_cluster_idx = (n_original_timesteps - 1) // timesteps_per_cluster

# Then if n_original_timesteps is not perfectly divisible:
# last_original_cluster_idx can equal n_original_clusters (out of bounds)

print("The issue occurs when:")
print("  n_original_timesteps % timesteps_per_cluster != 0")
print()
print("Example:")
print("  n_original_timesteps = 169")
print("  timesteps_per_cluster = 24")
print("  n_original_clusters = 169 // 24 = 7 (cluster_order size)")
print("  last_original_cluster_idx = (169-1) // 24 = 7 (INDEX OUT OF BOUNDS)")
EOF

Repository: flixOpt/flixopt

Length of output: 330


🏁 Script executed:

# Check if there are any tests with non-divisible timesteps
rg -n "test.*expand\|test.*cluster" tests/ --type py | grep -i test

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Look for test fixtures that might use non-standard timesteps
rg -n "8_days\|7_days\|169\|partial" tests/ --type py | head -20

Repository: flixOpt/flixopt

Length of output: 41


🏁 Script executed:

# Check the simple_system_8_days fixture to see timesteps
rg -n "simple_system_8_days" tests/ -A 10 | head -30

Repository: flixOpt/flixopt

Length of output: 2383


🏁 Script executed:

# Verify if there's protection/clamping at line 1273 (multi-dim case)
sed -n '1267,1280p' flixopt/transform_accessor.py | cat -n

Repository: flixOpt/flixopt

Length of output: 1113


🏁 Script executed:

# Check if cluster_order can have fewer entries than n_original_clusters suggests
python3 << 'EOF'
# Looking at the code flow:
# 1. cluster_orders[key] = tsam_agg.clusterOrder
#    This is the cluster order from TSAM - one value per original period
#
# 2. cluster_order_da created at line 952-953:
#    xr.DataArray(cluster_orders[key], dims=['original_cluster'], name='cluster_order')
#
# 3. In expand_solution() at line 1253:
#    n_original_clusters = cluster_structure.n_original_clusters
#
# 4. cluster_structure.n_original_clusters is a property that returns:
#    len(cluster_order) (the size of the original_cluster dimension)
#
# So if cluster_order has 7 entries, n_original_clusters = 7
# Valid indices: 0-6
# last_original_cluster_idx = (169-1) // 24 = 7 → OUT OF BOUNDS

print("CONFIRMED BUG:")
print("cluster_order.shape[0] = n_original_clusters")
print("cluster_order[last_original_cluster_idx] will fail if")
print("  last_original_cluster_idx >= n_original_clusters")
print()
print("This happens when:")
print("  n_original_timesteps % timesteps_per_cluster != 0")
EOF

Repository: flixOpt/flixopt

Length of output: 295


Fix out-of-bounds access in last_original_cluster_idx calculation.

Line 1257 calculates the last original cluster index without bounds checking:

last_original_cluster_idx = (n_original_timesteps - 1) // timesteps_per_cluster

When n_original_timesteps % timesteps_per_cluster != 0 (partial clusters at the end), this produces an index equal to or exceeding n_original_clusters, causing IndexError at lines 1269 and 1273 when accessing cluster_order[last_original_cluster_idx].

Clamp the index to valid bounds:

last_original_cluster_idx = min(
    (n_original_timesteps - 1) // timesteps_per_cluster,
    n_original_clusters - 1
)
🤖 Prompt for AI Agents
In flixopt/transform_accessor.py around lines 1253 to 1280, the computation of
last_original_cluster_idx can exceed the valid range when the final cluster is
partial, leading to out-of-bounds access of cluster_order; clamp the computed
index to n_original_clusters - 1 (i.e., take the min of the current formula and
n_original_clusters - 1) so subsequent indexing at lines 1269 and 1273 cannot
raise IndexError.

FBumann added 12 commits January 4, 2026 19:45
  1. Cell 32 (API Reference table): Updated defaults to 'hierarchical', 'medoidRepresentation', and None
  2. Cell 16: Swapped the example to show k_means as the alternative (since hierarchical is now default)
  3. Cell 17: Updated variable names to match
  4. Cell 33 (Key Takeaways): Clarified that random_state is only needed for non-deterministic methods like 'k_means'

  The code review
…arning log and empty DataFrame fallback, plus handling empty DataFrames when building the metrics Dataset

  2. Random state to tsam - Replaced global np.random.seed() with passing seed parameter directly to tsam's TimeSeriesAggregation
  3. tsam_kwargs conflict validation - Added validation that raises ValueError if user tries to override explicit parameters via **tsam_kwargs (including seed)
  4. predef_cluster_order validation - Added dimension validation for DataArray inputs, checking they match the FlowSystem's period/scenario structure
  5. Out-of-bounds fix - Clamped last_original_cluster_idx to n_original_clusters - 1 to handle partial clusters at the end
…ey) or next(...) to explicit if metrics_df is None: check

  2. removed random state
# Conflicts:
#	docs/notebooks/08c-clustering.ipynb
#	flixopt/dataset_plot_accessor.py
# Conflicts:
#	docs/notebooks/08c-clustering.ipynb
#	flixopt/dataset_plot_accessor.py
# Conflicts:
#	.github/workflows/docs.yaml
#	mkdocs.yml
…mmary of the changes made:

  Changes Made

  1. flixopt/config.py

  - Replaced three separate config attributes (extra_dim_priority, dim_slot_priority, x_dim_priority) with a single unified dim_priority tuple
  - Updated CONFIG.Plotting class docstring and attribute definitions
  - Updated to_dict() method to use the new attribute
  - The new priority order: ('time', 'duration', 'duration_pct', 'variable', 'cluster', 'period', 'scenario')

  2. flixopt/dataset_plot_accessor.py

  - Created new assign_slots() function that centralizes all dimension-to-slot assignment logic
  - Fixed slot fill order: x → color → facet_col → facet_row → animation_frame
  - Updated all plot methods (bar, stacked_bar, line, area, heatmap, scatter, pie) to use assign_slots()
  - Removed old _get_x_dim() and _resolve_auto_facets() functions
  - Updated docstrings to reference dim_priority instead of x_dim_priority

  3. flixopt/statistics_accessor.py

  - Updated _resolve_auto_facets() to use the new assign_slots() function internally
  - Added import for assign_slots from dataset_plot_accessor

  Key Design Decisions

  - Single priority list controls all auto-assignment
  - Slots are filled in fixed order based on availability
  - None means a slot is not available for that plot type
  - 'auto' triggers auto-assignment from priority list
  - Explicit string values override auto-assignment
@FBumann FBumann changed the title Feature/tsam params Clustering Parameter Refinements & Unified Slot Assignment Jan 5, 2026
@FBumann FBumann merged commit c94f6da into feature/aggregate-rework-v2 Jan 5, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants