-
-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Use mesh bounds center for transparent/transmissive sorting #22041
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Transparent and transmissive phases previously used the instance translation from GlobalTransform as the sort position. This breaks down when mesh geometry is authored in "world-like" coordinates and the instance transform is identity or near-identity (common in building/CAD-style content). In such cases multiple transparent instances end up with the same translation and produce incorrect draw order. This change introduces sorting based on the world-space center of the mesh bounds instead of the raw translation. The local bounds center is stored per mesh/instance and transformed by the instance’s world transform when building sort keys. This adds a small amount of per-mesh/instance data but produces much more correct transparent and transmissive rendering in real-world scenes.
|
I haven't reviewed the code yet and I'm not opposed to the idea but I would say that for an app that cares a lot about correct transparency the solution should be using some form of order independent transparency. We have support for it in bevy, there's still some work to be done on it but it can definitely be used for CAD apps since it's already being used in production CAD apps. |
|
Okay, I looked at the code and everything seems to make sense to me. The only thing I would like to see is some kind of benchmark that shows that it isn't introducing a big performance regression. And if possible it would be nice to have numbers comparing with and without the tie breaker. |
Thanks for looking at the code and for the suggestion! What part of the change would you like to see benchmarked? From my side there are two main areas:
Even with that overhead it is still roughly 2-3x faster than I held off on implementing the packed key because it requires a proper f32-to-lex-u32 conversion utility (similar to I am also not sure how large the sorting cost is in the overall blending pipeline. For example, saving ~1 ms on sorting 100k instances might be outweighed significantly by the cost of issuing 100k draw calls. If this approach makes sense, I can:
Please let me know which option you would prefer, and I will update the PR accordingly. |
A bit of both. I mostly want to make sure this PR isn't a regression. I doubt that using the AABB center would have a high impact but it's still a pretty hot path so I would prefer to have at least some numbers to confirm it. I'd also like to see how much of an impact using the tie breaker makes. I assume it will be fairly small relative to everything else and is worth it for the stability gain but I always prefer having real numbers instead of assuming. As for how to test, you don't need to add new benches. Just try to run a few complex scenes with a lot of transparent meshes and compare the frametimes using tracy. Like, maybe just spawn a 50x50x50 grid of transparent cubes and see if you see any performance impact. Oh and don't bother about packing unless you confirm that the impact of sorting with the tie breaker is high enough that it matters. We can always do it later if necessary but I prefer having a baseline that's easier to understand. |
|
I should specify, I would even be happy with just a tracy comparison of a scene with a lot of meshes of main vs this PR. Comparing with vs without the tie breaker would be nice but not necessary at all. |
Thanks for the guidance! I tried it quickly without the tie breaker and already see about a 10% regression. I suspect this comes from the baseline using a no-op sort, so I'll need to set up the same test using instanced meshes where the distances are non-zero. |
Summary for a model with 50x50x50 (125k instances) transparent cubes. For the baseline, I have to add a Transform component to all instances, so their distances are included in the sorting. Each run captured around 800 frames.
Time is the median value in milliseconds. Summary:
In single-threaded mode (WASM or native with multithreading disabled), the effect becomes more noticeable: sorting adds from 1.28 ms to 2 ms to total frame time, which corresponds to roughly 1-2% in the worst case. |
|
Alright, thank you for the benchmarks! With these numbers in mind, what I would suggest is to remove the tie breaker from this PR then open a separate PR (that depends on this PR) that adds the tie breaker but try to make it optional. I'm not entirely sure yet how to design the api to make it optional so that's why I think it should be a separate PR so we can merge the AABB center part faster. |
|
I have removed the tie breaker from this PR. That said, I still believe that determinism for rendering semi-transparent objects is worth the cost. Even in the extreme benchmark scenario we tested, the regression was only about 1-2%. And this benchmark is truly an extreme case: it contains an enormous number of semi-transparent objects, far more than any typical scene would include. In practice, the number of transparent objects is usually much smaller, and most of the frame cost is dominated by opaque geometry, shading, and other pipeline stages. Because of that, the real-world overhead would likely be an order of magnitude lower than what we see in this artificial stress test. From my perspective, paying 1-2% in the worst possible case for stable sorting is very reasonable, especially considering that without determinism we get visible flickering, which is a much worse user experience. Regarding making the tie breaker optional, I don't yet have a clean idea of what the best API would look like. A cargo feature feels too heavy for such a small piece of functionality, and exposing it through a resource also seems like a disproportionately large infrastructural change. If you have thoughts on a clean and lightweight way to make this optional, I would be happy to hear your suggestions. |
I tend to agree, but we generally try to be as flexible as possible and offer as many knobs as possible to users. Also, like I said originally, if correct transparency is important at the cost of performance then I believe OIT is probably a better solution since it will handle a lot more edge cases that simply can't be handled by alpha blending on it's own. Also, it's 1-2% of the global frame time but that's assuming the systems can run in parallel and other work can be done. If a user decided to disabled multi threading or introduce some kind of bottleneck on the sorting phase it would give a larger impact on the total frame time. If we only look at the system itself it's essentially doubling the runtime which isn't ideal. As for how to make it optional. I think just having a global resource is fine? We have a few of those already. It's possible we end up deciding not making it optional but I prefer keeping that decision in a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the tie breaker is removed I'll be comfortable with adding my approval because everything else in the code LGTM.
Like I mentioned in my previous comment. I'm not sure yet if we want to make it optional or not but I'd rather make that decision in a separate PR.
Thanks for the review! The tie breaker was removed from this PR (I mentioned this in the last comment). |
Ah, sorry, I completely forgot about that. Let me update my review. |
Got it, no worries. Do you need anything from me to get this merged? |









Transparent and transmissive phases previously used the instance translation from GlobalTransform as the sort position. This breaks down when mesh geometry is authored in "world-like" coordinates and the instance transform is identity or near-identity (common in building/CAD-style content). In such cases multiple transparent instances end up with the same translation and produce incorrect draw order.
This change introduces sorting based on the world-space center of the mesh bounds instead of the raw translation. The local bounds center is stored per mesh/instance and transformed by the instance’s world transform when building sort keys. This adds a small amount of per-mesh/instance data but produces much more correct transparent and transmissive rendering in real-world scenes.
Objective
Currently, transparent and transmissive render phases in Bevy sort instances using the translation from GlobalTransform. This works only if the mesh origin is a good proxy for the geometry position. In many real-world cases (especially CAD/architecture-like content), the mesh data is authored in "world-like" coordinates and the instance
Transformis identity. In such setups, sorting by translation produces incorrect draw order for transparent/transmissive objects.I propose switching the sorting key from
GlobalTransform.translationto the world-space center of the mesh bounds for each instance.Solution
Instead of using
GlobalTransform.translationas the sort position for transparent/transmissive phases, use the world-space center of the mesh bounds:RenderMeshInstanceSharedascenter: Vec3derived from the meshAabb).This way:
The main trade-offs:
RenderMeshInstanceShared(typically +12 or +16 bytes depending on alignment),Alternative approach and its drawbacks
In theory, this could be fixed by baking meshes so that:
Transformis adjusted to move it back into place.However, this has several drawbacks:
In practice, this is not a scalable or convenient solution.
Secondary issue: unstable ordering when depth is equal
There is another related problem with the current sorting: when two transparent/transmissive instances end up with the same view-space depth (for example, their centers project onto the same depth plane), the resulting draw order becomes unstable. This leads to visible flickering, because the internal order of
RenderEntityitems is not guaranteed to bestable between frames.
In practice this happens quite easily, especially when multiple transparent instances share the same or very similar sort depth, and
their relative order in the extracted render list can change frame to frame.
To address this, I suggest extending the sort key with a deterministic tie-breaker, for example the entity's main index. Conceptually, the sort key would become:
This ensures that instances with the same depth keep a consistent draw order across frames, removing flickering while preserving the intended depth-based sorting behavior.
Testing
cargo run -p ci -- test cargo run -p ci -- doc cargo run -p ci -- compileRun this "example"
Showcase
In my tests with building models (windows, glass, etc.), switching from translation-based sorting to bounds-center-based sorting noticeably improves the visual result. Transparent surfaces that were previously fighting or blending incorrectly now render in a much more expected order.
Current:
https://youtu.be/WjDjPAoKK6w
Sort by aabb center:
https://youtu.be/-Sl4GOXp_vQ
Sort by aabb center + tie breaker:
https://youtu.be/0aQhkSKxECo