Skip to content

Conversation

@atlv24
Copy link
Contributor

@atlv24 atlv24 commented Aug 19, 2025

Objective

  • Optimize clean compile time by 10%

Solution

  • Factor out non-generic code from generic function that is called a lot

Main: (92s total)
image

This PR: (82s total)
image

@atlv24 atlv24 added A-ECS Entities, components, systems, and events S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help D-Unsafe Touches with unsafe code in some way S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Aug 19, 2025
@alice-i-cecile alice-i-cecile added C-Performance A change motivated by improving speed, memory usage or compile times S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Aug 19, 2025
@james7132 james7132 self-requested a review August 19, 2025 02:04
@james7132 james7132 added the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label Aug 19, 2025
Copy link
Member

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some comparative benchmarks to at least show that this isn't regressing runtime performance.

let new_archetype = &*new_archetype;
// SAFETY: We have no outstanding mutable references to world as they were dropped
let mut deferred_world = unsafe { self.world.into_deferred() };
let deferred_world = unsafe { self.world.into_deferred() };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sparse_sets and table aren't dropped yet here. You probably need to put the above code into a block

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be ok? The borrow checker might be smart enough to know that they aren't used after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, its fine, but im gonna make it explicit anyways

@atlv24
Copy link
Contributor Author

atlv24 commented Aug 21, 2025

The compile time win seems to be variable per platform, its not as much as win on Windows as it is on Mac it seems. Additional testing would be appreciated

@hymm
Copy link
Contributor

hymm commented Aug 24, 2025

Running on windows seems to be pretty in the noise with default number of jobs (this pr: 2m01s,2m04s, 2m03s) vs (main: 2m05s, 2m01s, 2m04s). Gain about 14s with 1 job: main: 23m 20s; pr: 23m 06s. Probably bottlenecked by some other crate.

I seem to be getting regressions in the perf

group                                                  main2                                  pr2
-----                                                  -----                                  ---
ecs::bundles::insert_many::insert_many/all             1.00      2.0±0.05ms        ? ?/sec    1.13      2.3±0.06ms        ? ?/sec
ecs::bundles::insert_many::insert_many/only_last       1.00    296.9±7.62µs        ? ?/sec    1.07   316.4±18.17µs        ? ?/sec
insert_commands/insert                                 1.00   513.6±21.47µs        ? ?/sec    1.10   563.2±27.56µs        ? ?/sec
insert_commands/insert_batch                           1.00   187.3±13.32µs        ? ?/sec    1.14   214.2±15.79µs        ? ?/sec
insert_simple/base                                     1.00    229.5±9.53µs        ? ?/sec    1.02   234.3±11.00µs        ? ?/sec
insert_simple/unbatched                                1.04   621.3±30.81µs        ? ?/sec    1.00   598.3±19.64µs        ? ?/sec

@atlv24
Copy link
Contributor Author

atlv24 commented Aug 24, 2025

Wonder if a cfg-attr profile=release inline(always) would fix it

@hymm
Copy link
Contributor

hymm commented Aug 24, 2025

group                                                  main                                   main2                                  main3                                  pr-inline                              pr-inline2                             pr-inline3
-----                                                  ----                                   -----                                  -----                                  ---------                              ----------                             ----------
ecs::bundles::insert_many::insert_many/all             1.04      2.0±0.05ms        ? ?/sec    1.05      2.1±0.05ms        ? ?/sec    1.04      2.1±0.07ms        ? ?/sec    1.77      3.5±0.55ms        ? ?/sec    1.00  1968.9±59.53µs        ? ?/sec    1.10      2.2±0.14ms        ? ?/sec
ecs::bundles::insert_many::insert_many/only_last       1.05   303.8±11.14µs        ? ?/sec    1.03    298.2±9.02µs        ? ?/sec    1.03   299.8±13.24µs        ? ?/sec    1.00    289.7±9.56µs        ? ?/sec    1.02   296.3±23.65µs        ? ?/sec    1.01   293.7±12.93µs        ? ?/sec
insert_commands/insert                                 1.05   513.7±26.48µs        ? ?/sec    1.05   514.1±27.42µs        ? ?/sec    1.06   518.1±28.62µs        ? ?/sec    1.00   487.9±28.14µs        ? ?/sec    1.01   494.5±29.56µs        ? ?/sec    1.01   491.0±35.96µs        ? ?/sec
insert_commands/insert_batch                           1.16   197.1±16.63µs        ? ?/sec    1.09   185.5±14.72µs        ? ?/sec    1.09   185.3±12.50µs        ? ?/sec    1.01   172.5±14.71µs        ? ?/sec    1.01   171.7±14.09µs        ? ?/sec    1.00    170.1±9.66µs        ? ?/sec
insert_simple/base                                     1.00   218.2±23.59µs        ? ?/sec    1.03   225.4±22.47µs        ? ?/sec    1.06    231.6±7.93µs        ? ?/sec    1.05    228.3±7.68µs        ? ?/sec    1.08   234.8±11.15µs        ? ?/sec    1.05    228.5±8.57µs        ? ?/sec
insert_simple/unbatched                                1.00   579.3±17.74µs        ? ?/sec    1.03   597.3±23.47µs        ? ?/sec    1.03   594.0±22.73µs        ? ?/sec    1.03   593.8±20.71µs        ? ?/sec    1.02   591.7±17.35µs        ? ?/sec    1.04   601.6±23.12µs        ? ?/sec

seems to have helped. A few of the benches are actually faster now and the others seem to be in the noise.

@hymm
Copy link
Contributor

hymm commented Aug 24, 2025

also ran cargo bench -p bench ecs

group                                                                                                     main3                                    pr-inline
-----                                                                                                     -----                                    ---------
all_added_detection/50000_entities_ecs::change_detection::Sparse                                          1.00     48.4±1.26µs        ? ?/sec      1.01     48.8±1.38µs        ? ?/sec
all_added_detection/50000_entities_ecs::change_detection::Table                                           1.01     38.9±1.37µs        ? ?/sec      1.00     38.4±1.13µs        ? ?/sec
all_added_detection/5000_entities_ecs::change_detection::Sparse                                           1.01      4.9±0.10µs        ? ?/sec      1.00      4.8±0.15µs        ? ?/sec
all_added_detection/5000_entities_ecs::change_detection::Table                                            1.02      3.7±0.46µs        ? ?/sec      1.00      3.6±0.43µs        ? ?/sec
all_changed_detection/50000_entities_ecs::change_detection::Sparse                                        1.06     51.0±6.46µs        ? ?/sec      1.00     48.2±0.96µs        ? ?/sec
all_changed_detection/50000_entities_ecs::change_detection::Table                                         1.03     39.2±1.42µs        ? ?/sec      1.00     38.2±0.83µs        ? ?/sec
all_changed_detection/5000_entities_ecs::change_detection::Sparse                                         1.00      4.8±0.12µs        ? ?/sec      1.00      4.8±0.18µs        ? ?/sec
all_changed_detection/5000_entities_ecs::change_detection::Table                                          1.00      3.6±0.45µs        ? ?/sec      1.00      3.6±0.45µs        ? ?/sec
ecs::bundles::insert_many::insert_many/all                                                                1.02      2.1±0.04ms        ? ?/sec      1.00      2.0±0.06ms        ? ?/sec
ecs::bundles::insert_many::insert_many/only_last                                                          1.02    299.2±9.77µs        ? ?/sec      1.00   292.4±18.61µs        ? ?/sec
ecs::bundles::spawn_many::spawn_many/static                                                               1.00    138.6±5.58µs        ? ?/sec      1.00    138.9±4.42µs        ? ?/sec
ecs::bundles::spawn_many_zst::spawn_many_zst/static                                                       1.05    96.8±11.54µs        ? ?/sec      1.00     91.8±3.74µs        ? ?/sec
ecs::bundles::spawn_one_zst::spawn_one_zst/static                                                         1.02   258.6±15.83µs        ? ?/sec      1.00   253.3±14.36µs        ? ?/sec
ecs::entity_cloning::filter/opt_in_all                                                                    1.03    168.6±5.88ns  5.7 MElem/sec      1.00    163.9±3.71ns  5.8 MElem/sec
ecs::entity_cloning::filter/opt_in_all_keep_all                                                           1.05    171.2±8.91ns  5.6 MElem/sec      1.00    163.8±6.66ns  5.8 MElem/sec
ecs::entity_cloning::filter/opt_in_all_keep_all_without_required                                          1.00   155.8±11.90ns  6.1 MElem/sec      1.42    220.5±6.37ns  4.3 MElem/sec
ecs::entity_cloning::filter/opt_in_all_keep_none                                                          1.04    170.3±5.22ns  5.6 MElem/sec      1.00    164.0±4.94ns  5.8 MElem/sec
ecs::entity_cloning::filter/opt_in_all_keep_none_without_required                                         1.12   170.4±25.13ns  5.6 MElem/sec      1.00    151.6±5.87ns  6.3 MElem/sec
ecs::entity_cloning::filter/opt_in_all_without_required                                                   1.04    155.2±3.51ns  6.1 MElem/sec      1.00    149.5±3.46ns  6.4 MElem/sec
ecs::entity_cloning::filter/opt_in_none                                                                   1.15   161.3±35.81ns  5.9 MElem/sec      1.00    140.2±3.63ns  6.8 MElem/sec
ecs::entity_cloning::filter/opt_out_all                                                                   1.03    156.9±4.36ns  6.1 MElem/sec      1.00    152.8±4.79ns  6.2 MElem/sec
ecs::entity_cloning::filter/opt_out_none                                                                  1.03    157.3±6.20ns  6.1 MElem/sec      1.00    153.3±5.62ns  6.2 MElem/sec
ecs::entity_cloning::filter/opt_out_none_keep_all                                                         1.03    151.9±4.79ns  6.3 MElem/sec      1.00    147.6±5.46ns  6.5 MElem/sec
ecs::entity_cloning::filter/opt_out_none_keep_none                                                        1.03    160.5±5.68ns  5.9 MElem/sec      1.00    156.4±6.98ns  6.1 MElem/sec
ecs::entity_cloning::hierarchy_many/clone                                                                 1.10   247.5±44.18µs 1436.3 KElem/sec    1.00   225.8±21.98µs 1574.1 KElem/sec
ecs::entity_cloning::hierarchy_many/reflect                                                               1.00   615.5±56.64µs 577.5 KElem/sec     1.03   634.8±24.95µs 560.0 KElem/sec
ecs::entity_cloning::hierarchy_tall/clone                                                                 1.02     14.1±1.03µs  3.4 MElem/sec      1.00     13.9±0.78µs  3.5 MElem/sec
ecs::entity_cloning::hierarchy_tall/reflect                                                               1.05     19.4±1.36µs  2.5 MElem/sec      1.00     18.5±1.09µs  2.6 MElem/sec
ecs::entity_cloning::hierarchy_wide/clone                                                                 1.01     12.1±0.93µs  4.0 MElem/sec      1.00     12.0±0.61µs  4.0 MElem/sec
ecs::entity_cloning::hierarchy_wide/reflect                                                               1.03     16.9±0.78µs  2.9 MElem/sec      1.00     16.4±0.69µs  3.0 MElem/sec
ecs::entity_cloning::single/clone                                                                         1.24  752.1±169.07ns 1298.4 KElem/sec    1.00   608.1±50.83ns 1605.9 KElem/sec
ecs::entity_cloning::single/reflect                                                                       1.04  1724.1±129.73ns 566.4 KElem/sec    1.00  1661.1±79.94ns 587.9 KElem/sec
few_changed_detection/50000_entities_ecs::change_detection::Sparse                                        1.00     43.7±2.89µs        ? ?/sec      1.06     46.4±1.79µs        ? ?/sec
few_changed_detection/50000_entities_ecs::change_detection::Table                                         1.06     37.9±2.34µs        ? ?/sec      1.00     35.8±2.90µs        ? ?/sec
few_changed_detection/5000_entities_ecs::change_detection::Sparse                                         1.00      2.9±0.21µs        ? ?/sec      1.00      2.9±0.13µs        ? ?/sec
few_changed_detection/5000_entities_ecs::change_detection::Table                                          1.02      2.4±0.11µs        ? ?/sec      1.00      2.3±0.18µs        ? ?/sec
insert_commands/insert                                                                                    1.06   518.1±28.62µs        ? ?/sec      1.00   487.9±28.14µs        ? ?/sec
insert_commands/insert_batch                                                                              1.07   185.3±12.50µs        ? ?/sec      1.00   172.5±14.71µs        ? ?/sec
insert_simple/base                                                                                        1.01    231.6±7.93µs        ? ?/sec      1.00    228.3±7.68µs        ? ?/sec
insert_simple/unbatched                                                                                   1.00   594.0±22.73µs        ? ?/sec      1.00   593.8±20.71µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_10000_entities_ecs::change_detection::Sparse    1.01  1125.8±88.27µs        ? ?/sec      1.00  1120.0±63.07µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_10000_entities_ecs::change_detection::Table     1.00   335.5±42.81µs        ? ?/sec      1.01   337.3±38.14µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_1000_entities_ecs::change_detection::Sparse     1.00    66.9±14.02µs        ? ?/sec      1.23    82.3±31.49µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_1000_entities_ecs::change_detection::Table      1.05     32.8±6.19µs        ? ?/sec      1.00     31.4±3.54µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_100_entities_ecs::change_detection::Sparse      1.00      7.8±0.28µs        ? ?/sec      1.00      7.8±0.43µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_100_entities_ecs::change_detection::Table       1.02      4.6±0.23µs        ? ?/sec      1.00      4.5±0.13µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_10_entities_ecs::change_detection::Sparse       1.00   893.5±40.33ns        ? ?/sec      1.10  985.1±180.27ns        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_10_entities_ecs::change_detection::Table        1.00  844.2±240.68ns        ? ?/sec      1.03  868.5±221.30ns        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_10000_entities_ecs::change_detection::Sparse     1.01   145.1±15.37µs        ? ?/sec      1.00   144.3±14.58µs        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_10000_entities_ecs::change_detection::Table      1.16    65.2±12.15µs        ? ?/sec      1.00     56.1±3.07µs        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_1000_entities_ecs::change_detection::Sparse      1.03     11.2±1.72µs        ? ?/sec      1.00     10.8±1.12µs        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_1000_entities_ecs::change_detection::Table       1.00      5.7±0.20µs        ? ?/sec      1.00      5.7±0.21µs        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_100_entities_ecs::change_detection::Sparse       1.00  1245.8±25.99ns        ? ?/sec      1.15  1431.7±300.99ns        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_100_entities_ecs::change_detection::Table        1.00   799.1±18.45ns        ? ?/sec      1.04   829.4±70.39ns        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_10_entities_ecs::change_detection::Sparse        1.00    177.6±5.74ns        ? ?/sec      1.02    182.0±9.31ns        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_10_entities_ecs::change_detection::Table         1.00    144.2±4.96ns        ? ?/sec      1.16   167.0±30.02ns        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_10000_entities_ecs::change_detection::Sparse      1.00     25.5±0.79µs        ? ?/sec      1.03     26.2±3.49µs        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_10000_entities_ecs::change_detection::Table       1.00     13.0±0.43µs        ? ?/sec      1.03     13.4±0.99µs        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_1000_entities_ecs::change_detection::Sparse       1.00      2.6±0.08µs        ? ?/sec      1.05      2.8±0.37µs        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_1000_entities_ecs::change_detection::Table        1.00  1409.9±39.98ns        ? ?/sec      1.01  1430.3±132.65ns        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_100_entities_ecs::change_detection::Sparse        1.01   320.8±12.44ns        ? ?/sec      1.00    316.5±9.90ns        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_100_entities_ecs::change_detection::Table         1.00   216.0±15.83ns        ? ?/sec      1.26   272.0±57.25ns        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_10_entities_ecs::change_detection::Sparse         1.02     56.2±3.51ns        ? ?/sec      1.00     55.1±2.30ns        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_10_entities_ecs::change_detection::Table          1.00     44.4±3.31ns        ? ?/sec      1.10     49.0±2.84ns        ? ?/sec
none_changed_detection/50000_entities_ecs::change_detection::Sparse                                       1.01     25.0±0.75µs        ? ?/sec      1.00     24.9±0.85µs        ? ?/sec
none_changed_detection/50000_entities_ecs::change_detection::Table                                        1.00     13.3±0.78µs        ? ?/sec      1.00     13.3±1.17µs        ? ?/sec
none_changed_detection/5000_entities_ecs::change_detection::Sparse                                        1.00      2.6±0.05µs        ? ?/sec      1.00      2.5±0.06µs        ? ?/sec
none_changed_detection/5000_entities_ecs::change_detection::Table                                         1.01  1560.4±307.00ns        ? ?/sec     1.00  1542.2±285.64ns        ? ?/sec

@james7132
Copy link
Member

Wonder if a cfg-attr profile=release inline(always) would fix it

Are the compilation speed gains still present with that on? inline(always) is just a stronger hint, there's still a point where LLVM will override and force it to not inline.

@SkiFire13
Copy link
Contributor

inline(always) is just a stronger hint, there's still a point where LLVM will override and force it to not inline.

inline(always) will always inline except in cases where it's just not possible (e.g. recursive functions, where inlining it will just create another instance of the function call and so on).

Moreover nowadays inlining is also performed at the MIR level, not just at the LLVM level.

@atlv24 atlv24 added S-Needs-Review Needs reviewer attention (from anyone!) to move forward and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Sep 10, 2025
@cart cart added this to the 0.18 milestone Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Performance A change motivated by improving speed, memory usage or compile times D-Unsafe Touches with unsafe code in some way S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help S-Needs-Review Needs reviewer attention (from anyone!) to move forward

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants