-
Notifications
You must be signed in to change notification settings - Fork 341
[Type] Mat: better cache locality for operator*(Mat) #5921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
alxbilger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You must initialize the result before calling the operator +=.
0ca315f to
eb57d55
Compare
done , and re-did the benches (no change) |
| { | ||
| Mat<L,P,real> r(NOINIT); | ||
| for (Size i = 0; i<L; i++) | ||
| Mat<L,P,real> r; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Mat<L,P,real> r; | |
| Mat<L,P,real> r(0.0); |
The ctor w/o arguments doesn't set the inital values to 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I missclicked, I didn't want to approve X)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this is a "recent" change. Before 44ad519, the default constructor initialized the values...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

Changing accesses for better cache locality (suggested by AI)
TL;DR:
the Mat<3,3> version does not change because it has its own optimized specialized version
bigger the matrices, bigger the gain (Mat24x24, speedup of 400% in floats !)
macOS has a weird quirk for Mat6x6 on double, which is 50% slower ? 🤔 maybe due to a failed vectorization or somethin'
Timings:
Ubuntu 22.04, gcc12, lto, O3
Windows VS2026, release, lto
macOS, xcode 26, lto
By submitting this pull request, I acknowledge that
I have read, understand, and agree SOFA Developer Certificate of Origin (DCO).
Reviewers will merge this pull-request only if