Optimize BundleInserter::insert compile time #20647

atlv24 · 2025-08-19T00:57:46Z

Objective

Optimize clean compile time by 10%

Solution

Factor out non-generic code from generic function that is called a lot

Main: (92s total)

This PR: (82s total)

crates/bevy_ecs/src/bundle/insert.rs

james7132

I'd like to see some comparative benchmarks to at least show that this isn't regressing runtime performance.

crates/bevy_ecs/src/bundle/insert.rs

hymm · 2025-08-19T04:41:02Z

crates/bevy_ecs/src/bundle/insert.rs

-        let new_archetype = &*new_archetype;
        // SAFETY: We have no outstanding mutable references to world as they were dropped
-        let mut deferred_world = unsafe { self.world.into_deferred() };
+        let deferred_world = unsafe { self.world.into_deferred() };


sparse_sets and table aren't dropped yet here. You probably need to put the above code into a block

This might be ok? The borrow checker might be smart enough to know that they aren't used after.

yeah, its fine, but im gonna make it explicit anyways

crates/bevy_ecs/src/bundle/insert.rs

atlv24 · 2025-08-21T15:48:58Z

The compile time win seems to be variable per platform, its not as much as win on Windows as it is on Mac it seems. Additional testing would be appreciated

hymm · 2025-08-24T05:24:20Z

Running on windows seems to be pretty in the noise with default number of jobs (this pr: 2m01s,2m04s, 2m03s) vs (main: 2m05s, 2m01s, 2m04s). Gain about 14s with 1 job: main: 23m 20s; pr: 23m 06s. Probably bottlenecked by some other crate.

I seem to be getting regressions in the perf

group                                                  main2                                  pr2
-----                                                  -----                                  ---
ecs::bundles::insert_many::insert_many/all             1.00      2.0±0.05ms        ? ?/sec    1.13      2.3±0.06ms        ? ?/sec
ecs::bundles::insert_many::insert_many/only_last       1.00    296.9±7.62µs        ? ?/sec    1.07   316.4±18.17µs        ? ?/sec
insert_commands/insert                                 1.00   513.6±21.47µs        ? ?/sec    1.10   563.2±27.56µs        ? ?/sec
insert_commands/insert_batch                           1.00   187.3±13.32µs        ? ?/sec    1.14   214.2±15.79µs        ? ?/sec
insert_simple/base                                     1.00    229.5±9.53µs        ? ?/sec    1.02   234.3±11.00µs        ? ?/sec
insert_simple/unbatched                                1.04   621.3±30.81µs        ? ?/sec    1.00   598.3±19.64µs        ? ?/sec

atlv24 · 2025-08-24T16:08:12Z

Wonder if a cfg-attr profile=release inline(always) would fix it

hymm · 2025-08-24T20:38:51Z

group                                                  main                                   main2                                  main3                                  pr-inline                              pr-inline2                             pr-inline3
-----                                                  ----                                   -----                                  -----                                  ---------                              ----------                             ----------
ecs::bundles::insert_many::insert_many/all             1.04      2.0±0.05ms        ? ?/sec    1.05      2.1±0.05ms        ? ?/sec    1.04      2.1±0.07ms        ? ?/sec    1.77      3.5±0.55ms        ? ?/sec    1.00  1968.9±59.53µs        ? ?/sec    1.10      2.2±0.14ms        ? ?/sec
ecs::bundles::insert_many::insert_many/only_last       1.05   303.8±11.14µs        ? ?/sec    1.03    298.2±9.02µs        ? ?/sec    1.03   299.8±13.24µs        ? ?/sec    1.00    289.7±9.56µs        ? ?/sec    1.02   296.3±23.65µs        ? ?/sec    1.01   293.7±12.93µs        ? ?/sec
insert_commands/insert                                 1.05   513.7±26.48µs        ? ?/sec    1.05   514.1±27.42µs        ? ?/sec    1.06   518.1±28.62µs        ? ?/sec    1.00   487.9±28.14µs        ? ?/sec    1.01   494.5±29.56µs        ? ?/sec    1.01   491.0±35.96µs        ? ?/sec
insert_commands/insert_batch                           1.16   197.1±16.63µs        ? ?/sec    1.09   185.5±14.72µs        ? ?/sec    1.09   185.3±12.50µs        ? ?/sec    1.01   172.5±14.71µs        ? ?/sec    1.01   171.7±14.09µs        ? ?/sec    1.00    170.1±9.66µs        ? ?/sec
insert_simple/base                                     1.00   218.2±23.59µs        ? ?/sec    1.03   225.4±22.47µs        ? ?/sec    1.06    231.6±7.93µs        ? ?/sec    1.05    228.3±7.68µs        ? ?/sec    1.08   234.8±11.15µs        ? ?/sec    1.05    228.5±8.57µs        ? ?/sec
insert_simple/unbatched                                1.00   579.3±17.74µs        ? ?/sec    1.03   597.3±23.47µs        ? ?/sec    1.03   594.0±22.73µs        ? ?/sec    1.03   593.8±20.71µs        ? ?/sec    1.02   591.7±17.35µs        ? ?/sec    1.04   601.6±23.12µs        ? ?/sec

seems to have helped. A few of the benches are actually faster now and the others seem to be in the noise.

hymm · 2025-08-24T21:11:19Z

also ran cargo bench -p bench ecs

group                                                                                                     main3                                    pr-inline
-----                                                                                                     -----                                    ---------
all_added_detection/50000_entities_ecs::change_detection::Sparse                                          1.00     48.4±1.26µs        ? ?/sec      1.01     48.8±1.38µs        ? ?/sec
all_added_detection/50000_entities_ecs::change_detection::Table                                           1.01     38.9±1.37µs        ? ?/sec      1.00     38.4±1.13µs        ? ?/sec
all_added_detection/5000_entities_ecs::change_detection::Sparse                                           1.01      4.9±0.10µs        ? ?/sec      1.00      4.8±0.15µs        ? ?/sec
all_added_detection/5000_entities_ecs::change_detection::Table                                            1.02      3.7±0.46µs        ? ?/sec      1.00      3.6±0.43µs        ? ?/sec
all_changed_detection/50000_entities_ecs::change_detection::Sparse                                        1.06     51.0±6.46µs        ? ?/sec      1.00     48.2±0.96µs        ? ?/sec
all_changed_detection/50000_entities_ecs::change_detection::Table                                         1.03     39.2±1.42µs        ? ?/sec      1.00     38.2±0.83µs        ? ?/sec
all_changed_detection/5000_entities_ecs::change_detection::Sparse                                         1.00      4.8±0.12µs        ? ?/sec      1.00      4.8±0.18µs        ? ?/sec
all_changed_detection/5000_entities_ecs::change_detection::Table                                          1.00      3.6±0.45µs        ? ?/sec      1.00      3.6±0.45µs        ? ?/sec
ecs::bundles::insert_many::insert_many/all                                                                1.02      2.1±0.04ms        ? ?/sec      1.00      2.0±0.06ms        ? ?/sec
ecs::bundles::insert_many::insert_many/only_last                                                          1.02    299.2±9.77µs        ? ?/sec      1.00   292.4±18.61µs        ? ?/sec
ecs::bundles::spawn_many::spawn_many/static                                                               1.00    138.6±5.58µs        ? ?/sec      1.00    138.9±4.42µs        ? ?/sec
ecs::bundles::spawn_many_zst::spawn_many_zst/static                                                       1.05    96.8±11.54µs        ? ?/sec      1.00     91.8±3.74µs        ? ?/sec
ecs::bundles::spawn_one_zst::spawn_one_zst/static                                                         1.02   258.6±15.83µs        ? ?/sec      1.00   253.3±14.36µs        ? ?/sec
ecs::entity_cloning::filter/opt_in_all                                                                    1.03    168.6±5.88ns  5.7 MElem/sec      1.00    163.9±3.71ns  5.8 MElem/sec
ecs::entity_cloning::filter/opt_in_all_keep_all                                                           1.05    171.2±8.91ns  5.6 MElem/sec      1.00    163.8±6.66ns  5.8 MElem/sec
ecs::entity_cloning::filter/opt_in_all_keep_all_without_required                                          1.00   155.8±11.90ns  6.1 MElem/sec      1.42    220.5±6.37ns  4.3 MElem/sec
ecs::entity_cloning::filter/opt_in_all_keep_none                                                          1.04    170.3±5.22ns  5.6 MElem/sec      1.00    164.0±4.94ns  5.8 MElem/sec
ecs::entity_cloning::filter/opt_in_all_keep_none_without_required                                         1.12   170.4±25.13ns  5.6 MElem/sec      1.00    151.6±5.87ns  6.3 MElem/sec
ecs::entity_cloning::filter/opt_in_all_without_required                                                   1.04    155.2±3.51ns  6.1 MElem/sec      1.00    149.5±3.46ns  6.4 MElem/sec
ecs::entity_cloning::filter/opt_in_none                                                                   1.15   161.3±35.81ns  5.9 MElem/sec      1.00    140.2±3.63ns  6.8 MElem/sec
ecs::entity_cloning::filter/opt_out_all                                                                   1.03    156.9±4.36ns  6.1 MElem/sec      1.00    152.8±4.79ns  6.2 MElem/sec
ecs::entity_cloning::filter/opt_out_none                                                                  1.03    157.3±6.20ns  6.1 MElem/sec      1.00    153.3±5.62ns  6.2 MElem/sec
ecs::entity_cloning::filter/opt_out_none_keep_all                                                         1.03    151.9±4.79ns  6.3 MElem/sec      1.00    147.6±5.46ns  6.5 MElem/sec
ecs::entity_cloning::filter/opt_out_none_keep_none                                                        1.03    160.5±5.68ns  5.9 MElem/sec      1.00    156.4±6.98ns  6.1 MElem/sec
ecs::entity_cloning::hierarchy_many/clone                                                                 1.10   247.5±44.18µs 1436.3 KElem/sec    1.00   225.8±21.98µs 1574.1 KElem/sec
ecs::entity_cloning::hierarchy_many/reflect                                                               1.00   615.5±56.64µs 577.5 KElem/sec     1.03   634.8±24.95µs 560.0 KElem/sec
ecs::entity_cloning::hierarchy_tall/clone                                                                 1.02     14.1±1.03µs  3.4 MElem/sec      1.00     13.9±0.78µs  3.5 MElem/sec
ecs::entity_cloning::hierarchy_tall/reflect                                                               1.05     19.4±1.36µs  2.5 MElem/sec      1.00     18.5±1.09µs  2.6 MElem/sec
ecs::entity_cloning::hierarchy_wide/clone                                                                 1.01     12.1±0.93µs  4.0 MElem/sec      1.00     12.0±0.61µs  4.0 MElem/sec
ecs::entity_cloning::hierarchy_wide/reflect                                                               1.03     16.9±0.78µs  2.9 MElem/sec      1.00     16.4±0.69µs  3.0 MElem/sec
ecs::entity_cloning::single/clone                                                                         1.24  752.1±169.07ns 1298.4 KElem/sec    1.00   608.1±50.83ns 1605.9 KElem/sec
ecs::entity_cloning::single/reflect                                                                       1.04  1724.1±129.73ns 566.4 KElem/sec    1.00  1661.1±79.94ns 587.9 KElem/sec
few_changed_detection/50000_entities_ecs::change_detection::Sparse                                        1.00     43.7±2.89µs        ? ?/sec      1.06     46.4±1.79µs        ? ?/sec
few_changed_detection/50000_entities_ecs::change_detection::Table                                         1.06     37.9±2.34µs        ? ?/sec      1.00     35.8±2.90µs        ? ?/sec
few_changed_detection/5000_entities_ecs::change_detection::Sparse                                         1.00      2.9±0.21µs        ? ?/sec      1.00      2.9±0.13µs        ? ?/sec
few_changed_detection/5000_entities_ecs::change_detection::Table                                          1.02      2.4±0.11µs        ? ?/sec      1.00      2.3±0.18µs        ? ?/sec
insert_commands/insert                                                                                    1.06   518.1±28.62µs        ? ?/sec      1.00   487.9±28.14µs        ? ?/sec
insert_commands/insert_batch                                                                              1.07   185.3±12.50µs        ? ?/sec      1.00   172.5±14.71µs        ? ?/sec
insert_simple/base                                                                                        1.01    231.6±7.93µs        ? ?/sec      1.00    228.3±7.68µs        ? ?/sec
insert_simple/unbatched                                                                                   1.00   594.0±22.73µs        ? ?/sec      1.00   593.8±20.71µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_10000_entities_ecs::change_detection::Sparse    1.01  1125.8±88.27µs        ? ?/sec      1.00  1120.0±63.07µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_10000_entities_ecs::change_detection::Table     1.00   335.5±42.81µs        ? ?/sec      1.01   337.3±38.14µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_1000_entities_ecs::change_detection::Sparse     1.00    66.9±14.02µs        ? ?/sec      1.23    82.3±31.49µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_1000_entities_ecs::change_detection::Table      1.05     32.8±6.19µs        ? ?/sec      1.00     31.4±3.54µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_100_entities_ecs::change_detection::Sparse      1.00      7.8±0.28µs        ? ?/sec      1.00      7.8±0.43µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_100_entities_ecs::change_detection::Table       1.02      4.6±0.23µs        ? ?/sec      1.00      4.5±0.13µs        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_10_entities_ecs::change_detection::Sparse       1.00   893.5±40.33ns        ? ?/sec      1.10  985.1±180.27ns        ? ?/sec
multiple_archetypes_none_changed_detection/100_archetypes_10_entities_ecs::change_detection::Table        1.00  844.2±240.68ns        ? ?/sec      1.03  868.5±221.30ns        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_10000_entities_ecs::change_detection::Sparse     1.01   145.1±15.37µs        ? ?/sec      1.00   144.3±14.58µs        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_10000_entities_ecs::change_detection::Table      1.16    65.2±12.15µs        ? ?/sec      1.00     56.1±3.07µs        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_1000_entities_ecs::change_detection::Sparse      1.03     11.2±1.72µs        ? ?/sec      1.00     10.8±1.12µs        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_1000_entities_ecs::change_detection::Table       1.00      5.7±0.20µs        ? ?/sec      1.00      5.7±0.21µs        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_100_entities_ecs::change_detection::Sparse       1.00  1245.8±25.99ns        ? ?/sec      1.15  1431.7±300.99ns        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_100_entities_ecs::change_detection::Table        1.00   799.1±18.45ns        ? ?/sec      1.04   829.4±70.39ns        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_10_entities_ecs::change_detection::Sparse        1.00    177.6±5.74ns        ? ?/sec      1.02    182.0±9.31ns        ? ?/sec
multiple_archetypes_none_changed_detection/20_archetypes_10_entities_ecs::change_detection::Table         1.00    144.2±4.96ns        ? ?/sec      1.16   167.0±30.02ns        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_10000_entities_ecs::change_detection::Sparse      1.00     25.5±0.79µs        ? ?/sec      1.03     26.2±3.49µs        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_10000_entities_ecs::change_detection::Table       1.00     13.0±0.43µs        ? ?/sec      1.03     13.4±0.99µs        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_1000_entities_ecs::change_detection::Sparse       1.00      2.6±0.08µs        ? ?/sec      1.05      2.8±0.37µs        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_1000_entities_ecs::change_detection::Table        1.00  1409.9±39.98ns        ? ?/sec      1.01  1430.3±132.65ns        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_100_entities_ecs::change_detection::Sparse        1.01   320.8±12.44ns        ? ?/sec      1.00    316.5±9.90ns        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_100_entities_ecs::change_detection::Table         1.00   216.0±15.83ns        ? ?/sec      1.26   272.0±57.25ns        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_10_entities_ecs::change_detection::Sparse         1.02     56.2±3.51ns        ? ?/sec      1.00     55.1±2.30ns        ? ?/sec
multiple_archetypes_none_changed_detection/5_archetypes_10_entities_ecs::change_detection::Table          1.00     44.4±3.31ns        ? ?/sec      1.10     49.0±2.84ns        ? ?/sec
none_changed_detection/50000_entities_ecs::change_detection::Sparse                                       1.01     25.0±0.75µs        ? ?/sec      1.00     24.9±0.85µs        ? ?/sec
none_changed_detection/50000_entities_ecs::change_detection::Table                                        1.00     13.3±0.78µs        ? ?/sec      1.00     13.3±1.17µs        ? ?/sec
none_changed_detection/5000_entities_ecs::change_detection::Sparse                                        1.00      2.6±0.05µs        ? ?/sec      1.00      2.5±0.06µs        ? ?/sec
none_changed_detection/5000_entities_ecs::change_detection::Table                                         1.01  1560.4±307.00ns        ? ?/sec     1.00  1542.2±285.64ns        ? ?/sec

james7132 · 2025-08-25T00:26:57Z

Wonder if a cfg-attr profile=release inline(always) would fix it

Are the compilation speed gains still present with that on? inline(always) is just a stronger hint, there's still a point where LLVM will override and force it to not inline.

SkiFire13 · 2025-08-31T07:53:39Z

inline(always) is just a stronger hint, there's still a point where LLVM will override and force it to not inline.

inline(always) will always inline except in cases where it's just not possible (e.g. recursive functions, where inlining it will just create another instance of the function call and so on).

Moreover nowadays inlining is also performed at the MIR level, not just at the LLVM level.

atlv24 added 2 commits August 18, 2025 20:50

pull out duplicate invocation

47bb752

factor out before and after insert

111dd16

atlv24 added A-ECS Entities, components, systems, and events S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help D-Unsafe Touches with unsafe code in some way S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Aug 19, 2025

Victoronz reviewed Aug 19, 2025

View reviewed changes

crates/bevy_ecs/src/bundle/insert.rs Show resolved Hide resolved

james7132 self-requested a review August 19, 2025 02:04

james7132 added the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label Aug 19, 2025

revert world mut change

b45f8d5

james7132 reviewed Aug 19, 2025

View reviewed changes

crates/bevy_ecs/src/bundle/insert.rs Show resolved Hide resolved

Victoronz reviewed Aug 19, 2025

View reviewed changes

crates/bevy_ecs/src/bundle/insert.rs Outdated Show resolved Hide resolved

hymm reviewed Aug 19, 2025

View reviewed changes

atlv24 added 2 commits August 20, 2025 23:03

fix ci

9de3f38

docs

1cd93ef

Merge branch 'main' into ad/optimize_insert

0983957

conditional inline always

eefb69a

atlv24 added S-Needs-Review Needs reviewer attention (from anyone!) to move forward and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Sep 10, 2025

cart added this to the 0.18 milestone Sep 17, 2025

Uh oh!

Optimize BundleInserter::insert compile time #20647

Are you sure you want to change the base?

Optimize BundleInserter::insert compile time #20647

Conversation

atlv24 commented Aug 19, 2025

Objective

Solution

Uh oh!

Uh oh!

james7132 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hymm Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

hymm Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

atlv24 Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

atlv24 commented Aug 21, 2025

Uh oh!

hymm commented Aug 24, 2025

Uh oh!

atlv24 commented Aug 24, 2025

Uh oh!

hymm commented Aug 24, 2025

Uh oh!

hymm commented Aug 24, 2025

Uh oh!

james7132 commented Aug 25, 2025

Uh oh!

SkiFire13 commented Aug 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants