-
Notifications
You must be signed in to change notification settings - Fork 9
Feature/clustering segmentation #559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Phase 1: RangeIndex Support in FlowSystem
- Modified _validate_timesteps() to accept both DatetimeIndex and RangeIndex
- Updated _create_timesteps_with_extra() and calculate_timestep_duration() for RangeIndex
- Added is_segmented property to detect segmented FlowSystems
Phase 2: Segmentation Metadata in ClusterStructure
- Added fields: is_segmented, n_segments, segment_timestep_counts
- Updated serialization for persistence
Phase 3: TSAM Integration
- Added parameters: segmentation, n_segments, segment_representation_method
- Integrated with TSAM's segmentation, noSegments parameters
- Extract segment durations from TSAM's segmentDurationDict
- Build variable timestep_duration DataArray for segments
Phase 4: Storage Model Updates
- Updated InterclusterStorageModel for segmented systems
- Fixed sample offsets to use actual time dimension size
Phase 5: Solution Expansion
- Updated expand_data() to use n_segments for indexing in segmented systems
- Updated log messages to show segment info
Phase 7: Tests (14 new tests)
- TestSegmentation: 10 tests covering basic segmentation, structure, optimization, expansion
- TestSegmentationWithStorage: 4 tests for intercluster storage with segmentation
API Usage
# Cluster with inner-period segmentation
fs_reduced = flow_system.transform.cluster(
n_clusters=8,
cluster_duration='1D',
segmentation=True,
n_segments=6,
)
# Result: 8 clusters × 6 segments = 48 representative points
# vs. 8 clusters × 24 hours = 192 points without segmentation
fs_reduced.optimize(solver)
fs_expanded = fs_reduced.transform.expand_solution()
….py:791-798):
- Check if timestep_duration is in the reference structure
- Only resolve as DataArray reference if it's a string starting with ":::"
- For non-segmented systems (where it's stored as a simple list), skip resolution and let the constructor calculate it
2. Pass timestep_duration to the constructor (flow_system.py:820):
- Added timestep_duration=timestep_duration parameter to the cls() constructor call
|
Warning Rate limit exceeded@FBumann has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 20 minutes and 44 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds inner-period segmentation to time-series clustering: cluster() gains Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Transform as transform_accessor.cluster()
participant TSAM as TimeSeriesAggregation (TSAM)
participant ClusterStruct as ClusterStructure
participant FlowSys as FlowSystem
participant Expander as expand()
User->>Transform: cluster(n_segments, segment_representation_method)
activate Transform
Transform->>TSAM: request aggregation + segmentation (noSegments, representation)
activate TSAM
TSAM-->>Transform: typical_periods_data, segmentDurationDict
deactivate TSAM
Transform->>ClusterStruct: build structure (is_segmented=True, n_segments, segment_timestep_counts)
activate ClusterStruct
ClusterStruct-->>Transform: ClusterStructure
deactivate ClusterStruct
Transform->>FlowSys: create reduced FlowSystem (RangeIndex timesteps, timestep_duration)
activate FlowSys
FlowSys-->>Transform: reduced FlowSystem (is_segmented=True)
deactivate FlowSys
User->>Expander: expand(reduced_solution)
activate Expander
Expander->>Expander: map (cluster, segment) -> original timesteps using segment_timestep_counts
Expander->>FlowSys: produce expanded FlowSystem (DatetimeIndex, propagate metadata)
FlowSys-->>Expander: expanded FlowSystem + solution
Expander-->>User: full-resolution solution
deactivate Expander
deactivate Transform
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flixopt/clustering/base.py (1)
881-929:clustering.plot.clusters()likely breaks for segmented clustering (time length mismatch)With segmentation enabled:
- Aggregated data variables are built with shape
(cluster, n_segments)andtimecoord lengthn_segments(segments).ClusterStructure.timesteps_per_clusterstill reflects original hourly steps per period (e.g. 24).In
ClusteringPlotAccessor.clusters()you do:n_clusters = int(cs.n_clusters) ... timesteps_per_cluster = cs.timesteps_per_cluster ... if has_cluster_dim: data_by_cluster = da.values # shape: (n_clusters, n_segments) ... data_vars[var] = xr.DataArray( data_by_cluster, dims=['cluster', 'time'], coords={'cluster': cluster_labels, 'time': range(timesteps_per_cluster)}, )For segmented runs this means:
data_by_cluster.shape[1] == n_segments, but- the
timecoordinate has lengthtimesteps_per_cluster(e.g. 24),which is inconsistent and will either error in xarray or silently mislabel the time axis.
You probably want to branch on segmentation here, e.g.:
- Use
time_dim_size = cs.n_segmentsandrange(time_dim_size)whencs.is_segmented, or- Even better, reuse the actual
da.coords['time']so plots reflect segment indices / durations.That keeps non-segmented behavior intact while making
clusters()usable for segmented clusterings.
🤖 Fix all issues with AI agents
In @CHANGELOG.md:
- Line 74: The examples call a nonexistent method expand_solution() (e.g.,
fs_expanded = fs_segmented.transform.expand_solution()); replace those calls
with the actual method name expand() so they read fs_expanded =
fs_segmented.transform.expand(), and update the other example occurrence
similarly to use .expand() (ensure any reference to expand_solution() is changed
to expand()).
In @docs/notebooks/08c3-clustering-comparison.ipynb:
- Around line 40-56: The notebook text claims a full year (8760 timesteps) but
the code constructs the system with
create_district_heating_system(duration='quarter'), causing inconsistency; fix
by either updating the markdown to reflect a quarter-year example (e.g.,
"quarter-year hourly data") or change the call to
create_district_heating_system(duration='year') (or remove the duration arg to
use the default full year) so the markdown and the code
(create_district_heating_system and its duration parameter) match.
- Around line 98-109: The comment/labels clash with the actual cluster count:
update the call to flow_system.transform.cluster or the labels so they
match—either change n_clusters=16 to n_clusters=8 and rename fs_seg.name and
results key to "8x6 segmented", or keep n_clusters=16 but set results['16x6
segmented']['timesteps'] = len(fs_seg.timesteps) (and update fs_seg.name/comment
to reflect 16 clusters); ensure you reference the cluster call
(flow_system.transform.cluster), the segment object name (fs_seg.name) and the
results entry (results['...']['timesteps']) when making the fix.
In @tests/test_cluster_reduce_expand.py:
- Around line 841-986: Tests call a non-existent method expand_solution() on the
TransformAccessor; update all calls to use the actual method name expand().
Replace occurrences like fs_segmented.transform.expand_solution() (seen in
TestSegmentation methods test_segmented_expand_solution_restores_full_timesteps
and test_segmented_expanded_statistics_match and any similar tests including
TestSegmentationWithStorage) with fs_segmented.transform.expand() so the
TransformAccessor.expand() is invoked and no AttributeError occurs.
In @tests/test_clustering_io.py:
- Around line 542-643: TransformAccessor is missing expand_solution(), so calls
like fs.transform.expand_solution() raise AttributeError; add a thin alias
method named expand_solution on the TransformAccessor class that simply
delegates to the existing expand(...) implementation (i.e., define def
expand_solution(self, *args, **kwargs): return self.expand(*args, **kwargs)),
placing it next to the existing expand method so both fs.transform.expand() and
fs.transform.expand_solution() work identically.
🧹 Nitpick comments (1)
flixopt/components.py (1)
1480-1488: Segmentation-aware decay uses cluster-averaged durations; consider tightening assumptionsThe segmentation branches in
_add_linking_constraintsand_add_combined_bound_constraintsrely on:
hours_per_cluster = self._model.timestep_duration.sum('time').mean('cluster')hours_offset = cumulative_hours.isel(time=offset).mean('cluster')This is dimensionally consistent and works as long as:
- All clusters have the same total duration (which they should, given fixed
cluster_duration), and- You accept that the sampled combined bounds are already approximate.
If segment patterns ever differ strongly by cluster, using a mean over
clusterforhours_offsetslightly decouples the decay factor from the actual per-cluster segment geometry.Two low-impact options to make this more robust without changing behavior today:
- Assert the invariants you rely on (e.g., total duration per cluster is constant) to catch unexpected segment patterns early.
- Or, compute decay in original-cluster space using
cluster_order(per-periodhours_per_cluster/hours_offset) instead of a global mean, so Eq. 5 / Eq. 9 are strictly cluster-consistent.Given current segmentation design this is more of a precision improvement than a correctness bug, but worth considering while the segmentation API is still fresh.
Also applies to: 1533-1572
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
CHANGELOG.mddocs/notebooks/02-heat-system.ipynbdocs/notebooks/08c1-clustering.ipynbdocs/notebooks/08c3-clustering-comparison.ipynbdocs/notebooks/data/generate_example_systems.pydocs/user-guide/optimization/clustering.mdflixopt/clustering/base.pyflixopt/components.pyflixopt/flow_system.pyflixopt/transform_accessor.pymkdocs.ymltests/test_cluster_reduce_expand.pytests/test_clustering_io.py
🧰 Additional context used
🧬 Code graph analysis (3)
flixopt/clustering/base.py (1)
flixopt/flow_system.py (1)
is_segmented(2119-2125)
flixopt/flow_system.py (1)
flixopt/structure.py (7)
timestep_duration(220-222)timestep_duration(1751-1752)hours_of_previous_timesteps(225-226)dims(229-231)_resolve_dataarray_reference(758-790)indexes(234-236)flow_system(439-457)
tests/test_clustering_io.py (1)
flixopt/flow_system.py (12)
transform(1673-1691)optimize(1643-1670)to_netcdf(887-912)FlowSystem(55-2596)from_netcdf(915-932)is_segmented(2119-2125)solution(1517-1534)solution(1537-1540)to_dataset(673-733)from_dataset(736-885)sel(2385-2411)dims(2013-2033)
🪛 markdownlint-cli2 (0.18.1)
CHANGELOG.md
64-64: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
94-94: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (3.14)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
- GitHub Check: Build documentation
🔇 Additional comments (18)
CHANGELOG.md (2)
71-71: Clarify whethersegmentation=Trueparameter is required to enable segmentation.Line 71 uses only
n_segments=6without asegmentationflag, but line 105-106 includessegmentation=True, n_segments=4. This suggests two different API patterns. Confirm:
- Is
segmentation=Truerequired or optional?- Should both examples be consistent?
- Does providing
n_segmentsalone implicitly enable segmentation?Update examples to use a consistent API pattern for clarity.
Also applies to: 105-106
64-64: Markdown code block style (minor formatting issue).Static analysis flagged lines 64 and 94 as using fenced code blocks instead of indented blocks. Verify if this project's markdown style guide prefers indented blocks (e.g., per
markdownlintconfiguration). If fenced blocks are acceptable, these flags can be safely ignored as false positives.Also applies to: 94-94
flixopt/flow_system.py (7)
176-211: Segmented timestep_duration handling is well-structured.The three-way fallback logic correctly handles:
- Explicitly provided
timestep_duration(for segmented systems)- Computed duration from DatetimeIndex
Nonefor RangeIndex when not provided
274-290: Validation correctly extended for RangeIndex.The validation logic (length check, rename, monotonic check) works correctly for both
DatetimeIndexandRangeIndex.
331-367: Index-type-aware helper methods are correctly implemented.Both
_create_timesteps_with_extraandcalculate_timestep_durationproperly branch on index type, with RangeIndex returning simple extensions/None and DatetimeIndex using time-based calculations.
791-820: Dataset reconstruction correctly handles segmented systems.The logic properly:
- Resolves
timestep_durationDataArray references for segmented systems- Converts non-DatetimeIndex back to RangeIndex
- Passes both to the constructor
2118-2125: Cleanis_segmentedproperty implementation.The property correctly identifies segmented systems by checking for
RangeIndextimesteps.
1917-1933: Enhanced__repr__provides clear segmentation status.The conditional display correctly shows:
- Segmented systems: segment count with "(segmented)" indicator
- DatetimeIndex systems: count, frequency, and date range
- Clusters when present
2114-2116: Return type correctly updated for segmented support.docs/user-guide/optimization/clustering.md (1)
263-263: Documentation link updated to match new notebook name.docs/notebooks/08c1-clustering.ipynb (3)
80-80: Plotting argument change fromfacet_rowtocolor.The visualization approach changed from separate rows per variable to color-coded lines. This consolidates the visualization into a single panel.
606-624: Standard notebook metadata added.Kernelspec and language_info sections provide proper notebook execution context for MkDocs and Jupyter environments.
601-602: Notebook references verified and correct. Both 08c3-clustering-comparison.ipynb and 08d-clustering-multiperiod.ipynb exist in docs/notebooks/.docs/notebooks/02-heat-system.ipynb (1)
378-384: Notebook metadata populated with kernelspec.Standard housekeeping to ensure proper notebook execution context.
docs/notebooks/data/generate_example_systems.py (1)
293-316: Duration parameter adds flexibility while maintaining backward compatibility.The implementation:
- Defaults to
'month'(existing behavior)- Validates input with clear error message
- Uses explicit end dates for each duration option
tests/test_cluster_reduce_expand.py (1)
989-1032: Storage-segmentation integration tests are well-structured.The
TestSegmentationWithStorageclass validates:
- Intercluster storage with segmentation produces expected variables
- Expansion correctly handles SOC_boundary combination
- Expanded charge_state is non-negative (absolute SOC)
mkdocs.yml (1)
75-75: Navigation path correctly updated to existing notebook.The entry references an existing file that aligns with the PR's notebook restructuring for segmentation features.
flixopt/transform_accessor.py (1)
579-606: No changes needed: the implementation correctly relies on TSAM's inherited default behaviorThe docstring states: "If None, uses representation_method." According to TSAM's documentation,
segmentRepresentationMethodautomatically inherits the top-levelrepresentationMethodwhen not explicitly provided—which is exactly what the current implementation does.When
segment_representation_method is None, the code intentionally omitssegmentRepresentationMethodfrom the TSAM call, allowing TSAM's built-in inheritance mechanism to apply. This matches the documented behavior perfectly.The suggested fix of explicitly forwarding
representation_methodas a fallback would be redundant code that duplicates TSAM's own inheritance logic.
1. expand_solution() → expand() in:
- CHANGELOG.md (2 occurrences)
- tests/test_cluster_reduce_expand.py (3 occurrences)
- tests/test_clustering_io.py (2 occurrences)
- docs/notebooks/08c3-clustering-comparison.ipynb (1 occurrence in best practices)
2. Notebook description fix: Changed "full year of hourly data (8760 timesteps)" to "quarter of hourly data (2190 timesteps)" to match duration='quarter'
3. Segmentation naming fix:
- Comment: "8 clusters" → "16 clusters" (to match n_clusters=16)
- Timesteps calculation: 8 * 6 → 16 * 6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
docs/notebooks/08c3-clustering-comparison.ipynb (1)
361-379: Consider making recommendations data-driven.The hardcoded values in the recommendations ("~7% cost gap" and "5x speedup") may not match actual notebook output if the system or solver changes. Consider either:
- Adding a caveat that these are approximate/illustrative values, or
- Referencing the summary table for actual values
This is a minor documentation consistency concern; the practical guidance is sound.
CHANGELOG.md (1)
92-113: Clarify thesegmentation=Trueparameter in the example.Line 106 shows
segmentation=True, but based on the test code and API summary, segmentation is enabled by providing then_segmentsparameter (line 107), not a separatesegmentationboolean. Ifsegmentation=Trueis not a valid parameter, consider removing it to avoid confusion, or add a comment clarifying it's illustrative.Otherwise, the storage example correctly demonstrates segmentation with
intercluster_cyclicstorage mode and uses the correct.expand()method (line 112).📝 Proposed clarification
# Cluster with segmentation - extreme reduction fs_segmented = flow_system.transform.cluster( n_clusters=12, cluster_duration='1D', - segmentation=True, - n_segments=4, # 12 clusters × 4 segments = 48 timesteps (vs 12 × 24 = 288) + n_segments=4, # Enables segmentation: 12 clusters × 4 segments = 48 timesteps (vs 12 × 24 = 288) )
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
CHANGELOG.mddocs/notebooks/08c3-clustering-comparison.ipynbmkdocs.ymltests/test_cluster_reduce_expand.pytests/test_clustering_io.py
🚧 Files skipped from review as they are similar to previous changes (2)
- mkdocs.yml
- tests/test_clustering_io.py
🧰 Additional context used
🧬 Code graph analysis (1)
tests/test_cluster_reduce_expand.py (2)
flixopt/flow_system.py (5)
transform(1673-1691)is_segmented(2119-2125)sel(2385-2411)solution(1517-1534)solution(1537-1540)flixopt/transform_accessor.py (3)
cluster(578-1229)sel(81-121)expand(1534-1686)
🪛 markdownlint-cli2 (0.18.1)
CHANGELOG.md
64-64: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
94-94: Code block style
Expected: indented; Actual: fenced
(MD046, code-block-style)
🔇 Additional comments (9)
docs/notebooks/08c3-clustering-comparison.ipynb (7)
1-35: LGTM!The introduction clearly explains the notebook's purpose, and the imports are appropriate. The note about the
tsamrequirement is helpful for users.
40-60: LGTM! Previous inconsistency has been addressed.The markdown text now correctly states "a quarter of hourly data (2190 timesteps)" which matches the
duration='quarter'parameter in the code.
98-108: LGTM! Previous naming/number mismatch has been addressed.The segmentation configuration is now internally consistent: 16 clusters × 6 segments with name
'16x6 segmented'and timesteps calculated as16 * 6.
123-153: LGTM!The summary table construction is well-structured. The
max(abs(baseline_cost), 1)safeguard against division by zero is good defensive coding. The formatting specifications are appropriate for each metric.
169-182: LGTM!Expanding clustered solutions back to full resolution before comparison is the correct approach for fair evaluation. The implementation is clean and straightforward.
196-262: LGTM!The comparison visualizations comprehensively cover component sizes, heat production, and storage operation. Using expanded solutions ensures fair time-series comparison across configurations.
282-299: LGTM!The metrics collection correctly focuses on non-segmented clustered systems for RMSE/MAE comparison. Segmented systems have different time structures that would make direct metric comparison less meaningful.
tests/test_cluster_reduce_expand.py (1)
838-1032: Excellent comprehensive test coverage for segmentation!The new
TestSegmentationandTestSegmentationWithStorageclasses provide thorough validation of the inner-period segmentation feature. Tests correctly verify:
- RangeIndex timesteps and ClusterStructure metadata (
is_segmented,n_segments,segment_timestep_counts)- Variable
timestep_durationwith proper summation per cluster- Optimization and expansion workflows with segmented systems
- Statistics consistency between clustered and expanded solutions
- Storage integration with segmentation (
intercluster_cyclicmode)All calls to
.expand()are correct (lines 957, 982, 1022), and the test structure follows established patterns in the file.CHANGELOG.md (1)
54-75: Clear and accurate documentation of segmentation feature.The introduction and basic example effectively demonstrate the problem-size reduction benefits:
- 8760h → 8 clusters × 24h = 192 timesteps (without segmentation)
- 8760h → 8 clusters × 6 segments = 48 timesteps (with segmentation)
The code example correctly uses
.expand()(line 74) and aligns with the test implementations.
…mesteps_per_cluster to use the correct time dimension size 2. Updated the reshape and coordinate creation to use time_dim_size instead of timesteps_per_cluster
Description
Brief description of the changes in this PR.
Type of Change
Related Issues
Closes #(issue number)
Testing
Checklist
Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.