Skip to content

Conversation

@venhelhardt
Copy link
Contributor

Transparent and transmissive phases previously used the instance translation from GlobalTransform as the sort position. This breaks down when mesh geometry is authored in "world-like" coordinates and the instance transform is identity or near-identity (common in building/CAD-style content). In such cases multiple transparent instances end up with the same translation and produce incorrect draw order.

This change introduces sorting based on the world-space center of the mesh bounds instead of the raw translation. The local bounds center is stored per mesh/instance and transformed by the instance’s world transform when building sort keys. This adds a small amount of per-mesh/instance data but produces much more correct transparent and transmissive rendering in real-world scenes.

Objective

Currently, transparent and transmissive render phases in Bevy sort instances using the translation from GlobalTransform. This works only if the mesh origin is a good proxy for the geometry position. In many real-world cases (especially CAD/architecture-like content), the mesh data is authored in "world-like" coordinates and the instance Transform is identity. In such setups, sorting by translation produces incorrect draw order for transparent/transmissive objects.

I propose switching the sorting key from GlobalTransform.translation to the world-space center of the mesh bounds for each instance.

Solution

Instead of using GlobalTransform.translation as the sort position for transparent/transmissive phases, use the world-space center of the mesh bounds:

  1. Store the local-space bounds center for each render mesh (e.g. in something like RenderMeshInstanceShared as center: Vec3 derived from the mesh Aabb).
  2. For each instance, compute the world-space center by applying the instance transform.
  3. Use this world-space center as the position for distance / depth computation in view space when building sort keys for transparent and transmissive phases.

This way:

  • Sorting respects the actual spatial position of the geometry
  • Instances with baked-in “world-like” coordinates inside the mesh are handled correctly
  • Draw order for transparent objects becomes much more stable and visually correct in real scenes

The main trade-offs:

  • Adding a Vec3 center in RenderMeshInstanceShared (typically +12 or +16 bytes depending on alignment),
  • For each instance, we need to transform the local bounds center into world space to compute the sort key.

Alternative approach and its drawbacks

In theory, this could be fixed by baking meshes so that:

  • The mesh is recentered around its local bounding box center, and
  • The instance Transform is adjusted to move it back into place.

However, this has several drawbacks:

  • Requires modifying vertex data for each mesh (expensive and error-prone)
  • Requires either duplicating meshes or introducing one-off edits, which is bad for instancing and memory
  • Complicates asset workflows (tools, exporters, pipelines)
  • Still does not address dynamic or procedurally generated content

In practice, this is not a scalable or convenient solution.

Secondary issue: unstable ordering when depth is equal

There is another related problem with the current sorting: when two transparent/transmissive instances end up with the same view-space depth (for example, their centers project onto the same depth plane), the resulting draw order becomes unstable. This leads to visible flickering, because the internal order of RenderEntity items is not guaranteed to be
stable between frames.

In practice this happens quite easily, especially when multiple transparent instances share the same or very similar sort depth, and
their relative order in the extracted render list can change frame to frame.

To address this, I suggest extending the sort key with a deterministic tie-breaker, for example the entity's main index. Conceptually, the sort key would become:

  • primary: view-space depth (or distance),
  • secondary: stable per-entity index

This ensures that instances with the same depth keep a consistent draw order across frames, removing flickering while preserving the intended depth-based sorting behavior.

Testing

  • Did you test these changes? If so, how?
cargo run -p ci -- test
cargo run -p ci -- doc
cargo run -p ci -- compile
  • Are there any parts that need more testing? Not sure
  • How can other people (reviewers) test your changes? Is there anything specific they need to know?
    Run this "example"
use bevy::{
    camera_controller::free_camera::{FreeCamera, FreeCameraPlugin},
    prelude::*,
};

fn main() {
    App::new()
        .add_plugins(DefaultPlugins)
        .add_plugins(FreeCameraPlugin)
        .add_systems(Startup, setup)
        .add_systems(Update, view_orient)
        .run();
}

fn setup(
    mut commands: Commands,
    mut meshes: ResMut<Assets<Mesh>>,
    mut materials: ResMut<Assets<StandardMaterial>>,
) {
    let material = materials.add(StandardMaterial {
        base_color: Color::srgb_u8(150, 250, 150).with_alpha(0.7),
        alpha_mode: AlphaMode::Blend,
        ..default()
    });
    let mesh = Cuboid::new(3., 3., 1.)
        .mesh()
        .build()
        .translated_by(Vec3::new(1.5, 1.5, 0.5));

    // Cuboids grids
    for k in -1..=0 {
        let z_offset = k as f32 * 3.;

        for i in 0..3 {
            let x_offset = i as f32 * 3.25;

            for j in 0..3 {
                let y_offset = j as f32 * 3.25;

                commands.spawn((
                    Mesh3d(
                        meshes.add(
                            mesh.clone()
                                .translated_by(Vec3::new(x_offset, y_offset, z_offset)),
                        ),
                    ),
                    MeshMaterial3d(material.clone()),
                ));
            }
        }
    }

    // Cuboids at the center share the same position and are equidistant from the camera
    {
        commands.spawn((
            Mesh3d(meshes.add(mesh.clone().translated_by(Vec3::new(3.25, 3.25, 3.)))),
            MeshMaterial3d(material.clone()),
        ));
        commands.spawn((
            Mesh3d(meshes.add(mesh.clone().translated_by(Vec3::new(3.25, 3.25, 3.)))),
            MeshMaterial3d(materials.add(StandardMaterial {
                base_color: Color::srgb_u8(150, 150, 250).with_alpha(0.6),
                alpha_mode: AlphaMode::Blend,
                ..default()
            })),
        ));
        commands.spawn((
            Mesh3d(meshes.add(mesh.clone().translated_by(Vec3::new(3.25, 3.25, 3.)))),
            MeshMaterial3d(materials.add(StandardMaterial {
                base_color: Color::srgb_u8(250, 150, 150).with_alpha(0.5),
                alpha_mode: AlphaMode::Blend,
                ..default()
            })),
        ));
    }

    commands.spawn((PointLight::default(), Transform::from_xyz(-3., 10., 4.5)));
    commands.spawn((
        Camera3d::default(),
        Transform::from_xyz(-3., 12., 15.).looking_at(Vec3::new(4.75, 4.75, 0.), Vec3::Y),
        FreeCamera::default(),
    ));
    commands.spawn((
        Node {
            position_type: PositionType::Absolute,
            padding: UiRect::all(px(10)),
            ..default()
        },
        GlobalZIndex(i32::MAX),
        children![(
            Text::default(),
            children![
                (TextSpan::new("1 - 3D view\n")),
                (TextSpan::new("2 - Front view\n")),
                (TextSpan::new("3 - Top view\n")),
                (TextSpan::new("4 - Right view\n")),
            ]
        )],
    ));
}

fn view_orient(
    input: Res<ButtonInput<KeyCode>>,
    mut camera_xform: Single<&mut Transform, With<Camera>>,
) {
    let xform = if input.just_pressed(KeyCode::Digit1) {
        Some(Transform::from_xyz(-3., 12., 15.).looking_at(Vec3::new(4.75, 4.75, 0.), Vec3::Y))
    } else if input.just_pressed(KeyCode::Digit2) {
        Some(Transform::from_xyz(4.75, 4.75, 15.).looking_at(Vec3::new(4.75, 4.75, 0.), Vec3::Y))
    } else if input.just_pressed(KeyCode::Digit3) {
        Some(Transform::from_xyz(4.75, 18., -1.).looking_at(Vec3::new(4.75, 0., -1.), Vec3::NEG_Z))
    } else if input.just_pressed(KeyCode::Digit4) {
        Some(Transform::from_xyz(-15., 4.75, -1.).looking_at(Vec3::new(0., 4.75, -1.), Vec3::Y))
    } else {
        None
    };

    if let Some(xform) = xform {
        camera_xform.set_if_neq(xform);
    }
}
  • If relevant, what platforms did you test these changes on, and are there any important ones you can't test? MacOS

Showcase

In my tests with building models (windows, glass, etc.), switching from translation-based sorting to bounds-center-based sorting noticeably improves the visual result. Transparent surfaces that were previously fighting or blending incorrectly now render in a much more expected order.

Current:

https://youtu.be/WjDjPAoKK6w

Sort by aabb center:

https://youtu.be/-Sl4GOXp_vQ

Sort by aabb center + tie breaker:

https://youtu.be/0aQhkSKxECo

Transparent and transmissive phases previously used the instance
translation from GlobalTransform as the sort position. This breaks
down when mesh geometry is authored in "world-like" coordinates and
the instance transform is identity or near-identity (common in
building/CAD-style content). In such cases multiple transparent
instances end up with the same translation and produce incorrect
draw order.

This change introduces sorting based on the world-space center of the
mesh bounds instead of the raw translation. The local bounds center is
stored per mesh/instance and transformed by the instance’s world
transform when building sort keys. This adds a small amount of
per-mesh/instance data but produces much more correct transparent and
transmissive rendering in real-world scenes.
@IceSentry
Copy link
Contributor

I haven't reviewed the code yet and I'm not opposed to the idea but I would say that for an app that cares a lot about correct transparency the solution should be using some form of order independent transparency. We have support for it in bevy, there's still some work to be done on it but it can definitely be used for CAD apps since it's already being used in production CAD apps.

@IceSentry
Copy link
Contributor

Okay, I looked at the code and everything seems to make sense to me. The only thing I would like to see is some kind of benchmark that shows that it isn't introducing a big performance regression. And if possible it would be nice to have numbers comparing with and without the tie breaker.

@IceSentry IceSentry added A-Rendering Drawing game state to the screen S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 6, 2025
@IceSentry IceSentry added S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes labels Dec 6, 2025
@venhelhardt
Copy link
Contributor Author

venhelhardt commented Dec 8, 2025

Okay, I looked at the code and everything seems to make sense to me. The only thing I would like to see is some kind of benchmark that shows that it isn't introducing a big performance regression. And if possible it would be nice to have numbers comparing with and without the tie breaker.

Thanks for looking at the code and for the suggestion!

What part of the change would you like to see benchmarked? From my side there are two main areas:

  1. AABB center computation: getting the AABB is very cheap. Transforming the AABB center per instance is just a few muls/adds (ideally in vector form), so I do not expect it to be a measurible contributor in the pipeline.
  2. Sorting with the tie breaker: right now we sort by a tuple (distance, entity_index) using
    radsort. Because radsort is stable, it effectively sorts twice, so the cost is about 2x compared to sorting by distance only.

Even with that overhead it is still roughly 2-3x faster than std::sort / unstable_sort in my local tests. We can improve this by packing the distance (as a lexicographically sortable u32) and the main entity index into a single u64. Then radsort would sort once instead of twice, so the regression should drop from ~100% to ~50% (overhead comes from doubling the number of bits).

I held off on implementing the packed key because it requires a proper f32-to-lex-u32 conversion utility (similar to FloatOrd) along with tests. If you believe it's worth adding, I am more than willing to implement it.

I am also not sure how large the sorting cost is in the overall blending pipeline. For example, saving ~1 ms on sorting 100k instances might be outweighed significantly by the cost of issuing 100k draw calls.

If this approach makes sense, I can:

  • add sort benchmarks to benches/bevy_render, and/or
  • implement the packed (distance, entity_index) key.

Please let me know which option you would prefer, and I will update the PR accordingly.

@IceSentry
Copy link
Contributor

IceSentry commented Dec 9, 2025

What part of the change would you like to see benchmarked?

A bit of both. I mostly want to make sure this PR isn't a regression. I doubt that using the AABB center would have a high impact but it's still a pretty hot path so I would prefer to have at least some numbers to confirm it. I'd also like to see how much of an impact using the tie breaker makes. I assume it will be fairly small relative to everything else and is worth it for the stability gain but I always prefer having real numbers instead of assuming.

As for how to test, you don't need to add new benches. Just try to run a few complex scenes with a lot of transparent meshes and compare the frametimes using tracy. Like, maybe just spawn a 50x50x50 grid of transparent cubes and see if you see any performance impact.

Oh and don't bother about packing unless you confirm that the impact of sorting with the tie breaker is high enough that it matters. We can always do it later if necessary but I prefer having a baseline that's easier to understand.

@IceSentry
Copy link
Contributor

I should specify, I would even be happy with just a tracy comparison of a scene with a lot of meshes of main vs this PR. Comparing with vs without the tie breaker would be nice but not necessary at all.

@venhelhardt
Copy link
Contributor Author

As for how to test, you don't need to add new benches. Just try to run a few complex scenes with a lot of transparent meshes and compare the frametimes using tracy. Like, maybe just spawn a 50x50x50 grid of transparent cubes and see if you see any performance impact.

Thanks for the guidance! I tried it quickly without the tie breaker and already see about a 10% regression. I suspect this comes from the baseline using a no-op sort, so I'll need to set up the same test using instanced meshes where the distances are non-zero.

@venhelhardt
Copy link
Contributor Author

venhelhardt commented Dec 9, 2025

I should specify, I would even be happy with just a tracy comparison of a scene with a lot of meshes of main vs this PR. Comparing with vs without the tie breaker would be nice but not necessary at all.

Summary for a model with 50x50x50 (125k instances) transparent cubes. For the baseline, I have to add a Transform component to all instances, so their distances are included in the sorting. Each run captured around 800 frames.

frame sort_phase
baseline 106.05 2.89
without-tie 106.66 2.86
with-tie 111.74 4.86
with-tie-u64 107.04 4.17

Time is the median value in milliseconds.

Summary:

  • Collecting meshes (where the transform is applied) did not show any significant difference - around 0.2 µs.
  • The tie breaker adds a noticeable cost: about 70% overhead with the tuple key and 44% with the packed u64 key.
  • Overall frame time impact ranges from 0.3% to 5%.

In single-threaded mode (WASM or native with multithreading disabled), the effect becomes more noticeable: sorting adds from 1.28 ms to 2 ms to total frame time, which corresponds to roughly 1-2% in the worst case.

Base - Without tie Screenshot 2025-12-08 at 9 47 59 PM Screenshot 2025-12-08 at 9 48 14 PM Screenshot 2025-12-08 at 9 48 32 PM
Base - With tuple tie Screenshot 2025-12-08 at 9 49 38 PM Screenshot 2025-12-08 at 9 49 51 PM Screenshot 2025-12-08 at 9 50 11 PM
Base - With u64 tie Screenshot 2025-12-08 at 9 50 57 PM Screenshot 2025-12-08 at 9 52 23 PM Screenshot 2025-12-08 at 9 52 42 PM

@IceSentry
Copy link
Contributor

Alright, thank you for the benchmarks! With these numbers in mind, what I would suggest is to remove the tie breaker from this PR then open a separate PR (that depends on this PR) that adds the tie breaker but try to make it optional. I'm not entirely sure yet how to design the api to make it optional so that's why I think it should be a separate PR so we can merge the AABB center part faster.

@venhelhardt
Copy link
Contributor Author

venhelhardt commented Dec 9, 2025

I have removed the tie breaker from this PR.

That said, I still believe that determinism for rendering semi-transparent objects is worth the cost. Even in the extreme benchmark scenario we tested, the regression was only about 1-2%. And this benchmark is truly an extreme case: it contains an enormous number of semi-transparent objects, far more than any typical scene would include. In practice, the number of transparent objects is usually much smaller, and most of the frame cost is dominated by opaque geometry, shading, and other pipeline stages. Because of that, the real-world overhead would likely be an order of magnitude lower than what we see in this artificial stress test.

From my perspective, paying 1-2% in the worst possible case for stable sorting is very reasonable, especially considering that without determinism we get visible flickering, which is a much worse user experience.

Regarding making the tie breaker optional, I don't yet have a clean idea of what the best API would look like. A cargo feature feels too heavy for such a small piece of functionality, and exposing it through a resource also seems like a disproportionately large infrastructural change.

If you have thoughts on a clean and lightweight way to make this optional, I would be happy to hear your suggestions.

@IceSentry
Copy link
Contributor

That said, I still believe that determinism for rendering semi-transparent objects is worth the cost

I tend to agree, but we generally try to be as flexible as possible and offer as many knobs as possible to users. Also, like I said originally, if correct transparency is important at the cost of performance then I believe OIT is probably a better solution since it will handle a lot more edge cases that simply can't be handled by alpha blending on it's own.

Also, it's 1-2% of the global frame time but that's assuming the systems can run in parallel and other work can be done. If a user decided to disabled multi threading or introduce some kind of bottleneck on the sorting phase it would give a larger impact on the total frame time. If we only look at the system itself it's essentially doubling the runtime which isn't ideal.

As for how to make it optional. I think just having a global resource is fine? We have a few of those already.

It's possible we end up deciding not making it optional but I prefer keeping that decision in a separate PR.

@IceSentry IceSentry removed the S-Needs-Benchmarking This set of changes needs performance benchmarking to double-check that they help label Dec 10, 2025
Copy link
Contributor

@IceSentry IceSentry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the tie breaker is removed I'll be comfortable with adding my approval because everything else in the code LGTM.

Like I mentioned in my previous comment. I'm not sure yet if we want to make it optional or not but I'd rather make that decision in a separate PR.

@venhelhardt
Copy link
Contributor Author

Once the tie breaker is removed I'll be comfortable with adding my approval because everything else in the code LGTM.

Like I mentioned in my previous comment. I'm not sure yet if we want to make it optional or not but I'd rather make that decision in a separate PR.

Thanks for the review!

The tie breaker was removed from this PR (I mentioned this in the last comment).

@IceSentry
Copy link
Contributor

The tie breaker was removed from this PR (I mentioned this in the last comment).

Ah, sorry, I completely forgot about that. Let me update my review.

@venhelhardt
Copy link
Contributor Author

Ah, sorry, I completely forgot about that. Let me update my review.

Got it, no worries. Do you need anything from me to get this merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes S-Needs-Review Needs reviewer attention (from anyone!) to move forward

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants