[Controller] POC: overall speedup#573
Conversation
alxbilger
left a comment
There was a problem hiding this comment.
Is the cache system also relevant for other trampoline classes?
To my meager knowledge, I would say yes. But in this case it is especially useful because there were(are) many lookups to call implicitly all the *event() every step. |
57debb4 to
040d3e6
Compare
|
Nice cache mechanism. Now I wonder what would happend if someone changed dynamically the method after it has already been cached : MyController.onAnimateBeginEvent() # Cache is now active for this method
MyController.onAnimateBeginEvent = randomMethod # Cache is now invalid
MyController.onAnimateBeginEvent() # now what's called ? Maybe we could add a setattr overloead invalidating the cache when it is called on a method already cached ? |
no idea, I even did know you could change class method dynamically like that 🙃 |
Summary of Modifications
The changes in this branch (speedup_controller) optimize the Controller_Trampoline class in the Python bindings by adding a caching mechanism for Python method lookups:
Key Changes:
1. New caching infrastructure (in Binding_Controller.h):
- Added member variables to cache:
- m_pySelf - cached Python self reference (avoids repeated py::cast(this))
- m_methodCache - unordered_map storing Python method objects by name
- m_onEventMethod - cached fallback "onEvent" method
- m_hasOnEvent / m_cacheInitialized - state flags
2. New methods (in Binding_Controller.cpp):
- initializePythonCache() - initializes the cache on first use
- getCachedMethod() - retrieves methods from cache (or looks them up once and caches)
- callCachedMethod() - calls a cached Python method with an event
- Constructor and destructor to properly manage the cached Python objects with GIL
3. Optimized handleEvent():
- Previously: every event caused py::cast(this), py::hasattr(), and attr() lookups
- Now: uses cached method references, avoiding repeated Python attribute lookups
4. Optimized getClassName():
- Uses the cached m_pySelf when available instead of casting each time
Purpose:
This is a performance optimization that reduces overhead when handling frequent events (like AnimateBeginEvent, AnimateEndEvent), which can be called many times per simulation step. The caching eliminates repeated Python/C++ boundary crossings for method lookups.
040d3e6 to
63b1a8d
Compare
I did detect this issue some time ago, about having a controller in a scene was slowing down the simulation so much, especially on macOS. Even if the controller do nothing. And it gets slower and slower more there are Controllers.
DISCLAIMER: this was mostly the work of Claude, which detected/suggested the issues (lookups were slowing down the simulation) and generated the solution.
I just did the benches/tests to make sure it works, but as for everything with sofapython3 and/or pybind11, I cannot prove everything is okay/well-done. So deep review from experts would be appreciated 🫠
In any case, the modifications lead to a dramatic speed up:
(refer to the scene with this PR, which creates an empy scene with a certain number of Controller doing nothing)
Ubuntu 22.04 (gcc12, i7 13700k)
--> with 10controllers, 150x faster... 😮
Windows (MSVC2026, i7 11800h)
--> with 10controllers, 200x faster... 😲
macOS (xcode26, M3 max)
--> with 10controllers, 837x faster... 🤪