Change function calls to use vectorcall #5948

b-pass · 2025-12-26T04:59:39Z

Description

Change both the Python->C++ and the C++->Python function calls to use the faster "vector call" calling protocol.

This makes function calls faster, at the minor expense of having to handle the arguments slightly differently.

Suggested changelog entry:

Improved performance of function calls between Python and C++ by switching to the "vectorcall" calling protocol.

This reverts commit 418a034.

It cannot re-use the incoming tuple as before, because it is no longer a tuple at all. So a new tuple must be created, which then holds references for each member.

b-pass · 2025-12-26T16:52:11Z

Small benchmark results for short function calls, this PR provides modest improvement over current master...

This would be easier with `if constexpr`

rwgk · 2025-12-27T03:42:24Z

@b-pass I merged master, which had no conflicts.

My idea was that maybe Cursor can figure out the root cause for these two CI failures:

2025-12-26T23:29:10.4917170Z  E       ModuleNotFoundError: No module named 'mod_per_interpreter_gil'
...
2025-12-26T23:29:12.4779770Z  E       ModuleNotFoundError: No module named 'mod_shared_interpreter_gil'

I'll wait for this CI run to finish, to see if these errors appear again.

When testing locally with Python 3.14.2, default and freethreaded, I saw this error (with both builds):

>       m.array_reshape2(b)
E       ValueError: array_reshape2: input array total size is not a squared integer

EDIT: Please see #5948 (comment) for the resolution.

I'm attaching the full build & test logs, JIC.

I switched back to master, everything else being equal, this error didn't appear. So it's definitely somehow connected to this PR.

pybind11_gcc_v3.14_df793163d58_default_tests_log_2025-12-26+193154.txt

pybind11_gcc_v3.14_df793163d58_freethreaded_log_2025-12-26+193608.txt

The `mod_per_interpreter_gil`, `mod_shared_interpreter_gil`, and `mod_per_interpreter_gil_with_singleton` modules were being built but not installed into the wheel when using scikit-build-core (SKBUILD=true). This caused iOS (and potentially Android) CIBW tests to fail with ModuleNotFoundError. Root cause analysis: - The main test targets have install() commands (line 531) - The PYBIND11_MULTIPLE_INTERPRETERS_TEST_MODULES were missing equivalent install() commands - For regular CMake builds, this wasn't a problem because LIBRARY_OUTPUT_DIRECTORY places the modules next to pybind11_tests - For wheel builds, only targets with explicit install() commands are included in the wheel This issue was latent until commit fee2527 changed the test imports from `pytest.importorskip()` (graceful skip) to direct `import` statements (hard failure), which exposed the missing modules. Failing tests: - test_multiple_interpreters.py::test_independent_subinterpreters - test_multiple_interpreters.py::test_dependent_subinterpreters Error: ModuleNotFoundError: No module named 'mod_per_interpreter_gil'

rwgk · 2025-12-27T04:34:53Z

Quick update: I think Cursor found the root cause for the CIBW / iOS failures, and a fix.

I'm still looking into why we're not seeing the m.array_reshape2(b) errors here.

Add numpy==2.4.0 requirement for Python 3.14 (both default and free-threaded builds). NumPy 2.4.0 is the first version to provide official PyPI wheels for Python 3.14: - numpy-2.4.0-cp314-cp314-manylinux_2_27_x86_64...whl (default) - numpy-2.4.0-cp314-cp314t-manylinux_2_27_x86_64...whl (free-threaded) Previously, CI was skipping all numpy-dependent tests for Python 3.14 because PIP_ONLY_BINARY was set and no wheels were available: SKIPPED [...] test_numpy_array.py:8: could not import 'numpy': No module named 'numpy' With this change, the full numpy test suite will run on Python 3.14, providing better test coverage for the newest Python version. Note: Using exact pin (==2.4.0) rather than compatible release (~=2.4.0) to ensure reproducible CI results with the first known-working version.

rwgk · 2025-12-27T05:03:04Z

I pushed commit ca72d6c here because we only have very limited GHA resources. Depending on how it goes, it might be best to merge that commit in a separate PR. Let's see.

Ideally commit 6ed6d5a should also be in a separate PR, but same issue with the GHA resources.

rwgk · 2025-12-27T05:26:57Z

NumPy 2.3.5 vs 2.4.0 Observations

The Issue

When testing this PR locally with Python 3.14 + NumPy 2.3.5, I observed a test failure:

>       m.array_reshape2(b)
E       ValueError: array_reshape2: input array total size is not a squared integer

test_numpy_array.py::test_array_resize FAILED

This occurred when calling array_reshape2 on a transposed view of a resized 64-element array (which should have worked since 64 = 8²).

What We Know For Sure

The error appeared with NumPy 2.3.5 (released 2025-11-16) on Python 3.14
The error is gone with NumPy 2.4.0 (released 2025-12-20) on Python 3.14
This is NOT a bug in the vectorcall PR — the same test passes on master with NumPy 2.3.5, but that's because CI was skipping numpy tests for Python 3.14 entirely (no numpy entry in tests/requirements.txt for 3.14)
NumPy 2.4.0 wheels for Python 3.14 are now available on PyPI (both default cp314 and free-threaded cp314t)

What's Undetermined

The exact bug fix in NumPy between 2.3.5 and 2.4.0 that resolved this issue. I searched the NumPy 2.4.0 release notes but couldn't find a specific fix that clearly matches our observation. The release notes mention "many expired deprecations and bug fixes" without listing all of them individually.

Resolution

Added numpy==2.4.0 to tests/requirements.txt for Python 3.14+ (commit ca72d6c). This:

Enables numpy tests on Python 3.14 in CI (previously skipped)
Avoids the NumPy 2.3.5 bug

Add `-v` to the pytest command in tests/pyproject.toml to help diagnose hanging tests in CIBW jobs (particularly iOS). This will show each test name as it runs, making it easier to identify which specific test is hanging.

Skylion007 · 2025-12-28T18:05:15Z

Random design question, why does the collector store pyobject pointers directly vs py::handle?

It's a good bit faster at the cost of this one scary reinterpret_cast.

At 6, the struct is 144 bytes (not 128 bytes as the comment said).

b-pass · 2025-12-28T18:54:47Z

The push_back impl isnt quite right right.

Fixed.

Needs a static assert the heapvector? And maybe iarray?

Not sure what you mean... elaborate?

Random design question, why does the collector store pyobject pointers directly vs py::handle?

It was to avoid having to reinterpret_cast the C++ type to the C type in the Vectorcall call.... But I've consolidated both the PyObject* vector and the temp object vector into one and casted it in this latest version.

Skylion007 · 2025-12-28T19:02:11Z

The push_back impl isnt quite right right.

Fixed.

Needs a static assert the heapvector? And maybe iarray?

Not sure what you mean... elaborate?

Random design question, why does the collector store pyobject pointers directly vs py::handle?

It was to avoid having to reinterpret_cast the C++ type to the C type in the Vectorcall call.... But I've consolidated both the PyObject* vector and the temp object vector into one and casted it in this latest version.

Avoiding the reintrepret_cast is a valid reason, just curious.

Skylion007

I'm a bit confused why a bunch of list and tuple objects are being stored as generic objects? We don't do anything crazy to them and they all inherit py;:object. The only different is the implicit boolean checks maybe but those can be easily over by unwrapping the ptr any time and checking against nullptr

Skylion007 · 2025-12-28T19:18:25Z

include/pybind11/cast.h

+        if (PyList_Append(names_list.ptr(), name.release().ptr()) < 0) {
+            throw error_already_set();
+        }
+        m_args.emplace_back(a.value);


Suggested change

m_args.emplace_back(a.value);

m_args.emplace_back(std::move(a.value));

include/pybind11/cast.h

include/pybind11/detail/argument_vector.h

b-pass · 2025-12-28T20:49:47Z

I'm a bit confused why a bunch of list and tuple objects are being stored as generic objects? We don't do anything crazy to them and they all inherit py;:object. The only different is the implicit boolean checks maybe but those can be easily over by unwrapping the ptr any time and checking against nullptr

tuple and list don't have a constructor which allows them to be constructed as nullptr, where as object's default constructor is a nullptr. Typing these properly would make an extra allocation. IMO we can't afford the extra allocation here (in probably the hotest code path in all of pybind11). I also don't think it would be a good idea (or at least needs a lot of debate) to make a change to pybind11 tuple/list for this case.

It's just a bit inconvenient in this class ... I can certainly add some explanation in a comment about it.

EDIT: Turns out there is a way: auto null_list = reinterpret_steal<list>(handle());

Co-authored-by: Aaron Gokaslan <[email protected]>

They can be null if you "steal" a null handle.

…icit ownership Introduce `ref_small_vector` to manage PyObject* references in `unpacking_collector`, replacing the previous `small_vector<object>` approach. Primary goals: 1. **Maintainability**: The previous implementation relied on `sizeof(object) == sizeof(PyObject*)` and used a reinterpret_cast to pass the object array to vectorcall. This coupling to py::object's internal layout could break if someone adds a debug field or other member to py::handle/py::object in the future. 2. **Readability**: The new `push_back_steal()` vs `push_back_borrow()` API makes reference counting intent explicit at each call site, rather than relying on implicit py::object semantics. 3. **Intuitive code**: Storing `PyObject*` directly and passing it to `_PyObject_Vectorcall` without casts is straightforward and matches what the C API expects. No "scary" reinterpret_cast needed. Additional benefits: - `PyObject*` is trivially copyable, simplifying vector operations - Batch decref in destructor (tight loop vs N individual object destructors) - Self-documenting ownership semantics Design consideration: We considered folding the ref-counting functionality directly into `small_vector` via template specialization for `PyObject*`. We decided against this because: - It would give `small_vector<PyObject*, N>` a different interface than the generic `small_vector<T, N>` (steal/borrow vs push_back) - Someone might want a non-ref-counting `small_vector<PyObject*, N>` - The specialization behavior could surprise users expecting uniform semantics A separate `ref_small_vector` type makes the ref-counting behavior explicit and self-documenting, while keeping `small_vector` generic and predictable.

rwgk · 2025-12-29T02:28:15Z

@b-pass commit d828e7d is meant to be a suggestion, although already fully developed. Please let me know what you think.

b-pass · 2025-12-29T02:33:19Z

Please let me know what you think.

I like it. It's less code than I thought it would be.

Skylion007 · 2025-12-30T19:12:46Z

include/pybind11/cast.h

+        if (m_names) {
+            nargs -= m_names.size();
+        }
+        PyObject *result = _PyObject_Vectorcall(


Slight nit could have been a py::handle

Skylion007 · 2025-12-30T19:25:40Z

include/pybind11/detail/argument_vector.h

+/// destructor, and avoids the need for reinterpret_cast when passing to vectorcall.
+template <std::size_t InlineSize>
+class ref_small_vector {
+public:


Ugh, I don't like this, but I don't see a better way of doing due to layout compatibiltiy issues with py::object. Every other method of doing this requires a copy.

Skylion007

Not sure if you thought of this, but if you make push_back only take in objects, you can handle the reinterpret steal and reinterpret borrow using only move and release mechanics.

Skylion007 · 2025-12-30T19:31:36Z

include/pybind11/detail/argument_vector.h

+    }
+
+    /// Add a pointer, taking ownership (no incref, will decref on destruction)
+    void push_back_steal(PyObject *p) { m_ptrs.push_back(p); }


Suggested change

void push_back_steal(PyObject *p) { m_ptrs.push_back(p); }

void push_back_steal(object &&o) { m_ptrs.push_back(o.release().ptr()); }

include/pybind11/cast.h

Skylion007 · 2025-12-30T19:35:35Z

include/pybind11/cast.h

-            detail::make_caster<T>::cast(std::forward<T>(x), policy, {}));
-        if (!o) {
+    void process(list & /*names_list*/, T &&x) {
+        handle h = detail::make_caster<T>::cast(std::forward<T>(x), policy, {});


Converting this to a handle here makes me a bit uncomfortable. You could keep it in an object and it just use the move from ctor, the only downside is you'll have one extra call to PYXDECREF on a nullptr.

Skylion007 · 2025-12-30T19:39:22Z

I like how you have it done now, I just wonder if it could be clenaed up anymore with .release().ptr() semantics.

include/pybind11/cast.h

Skylion007 · 2025-12-30T19:43:30Z

include/pybind11/cast.h

        }
-        m_kwargs[a.name] = std::move(a.value);
+        names_list.append(std::move(name));
+        m_args.push_back_borrow(a.value.ptr());


Yeah both of these could just be a push_back(&&o) with a release ptr syntax and then you don't have to duplicate the logic and while the implementation is py::objects the input is still py::object. The one downside is the extra xdecref call on the nullptr stored in the py::object, but doubt that nullcheck will be a bottleneck?

Suggested change

m_args.push_back_borrow(a.value.ptr());

m_args.push_back_steal(reinterpret_borrow<object>(a.value));

rwgk · 2026-01-06T21:22:38Z

@b-pass I'm in favor of merging based on @Skylion007 and my reviews.

We could then work on the v3.0.2 release. (I could jump-start that.)

Skylion007 · 2026-01-06T21:25:37Z

Might be enough for a minor version bump to 3.1, no?

rwgk · 2026-01-06T21:36:20Z

Might be enough for a minor version bump to 3.1, no?

Not sure, maybe let's look at the changelog PR first?

There are two PRs in a holding pattern, for those we need to bump the minor version for sure (IMO):

Expand float and complex strict mode to allow ints and ints/float (for PEP 484 compatibility). #5879
Improve performance of enum_ operators by going back to specific implementation #5887

b-pass added 10 commits December 23, 2025 23:17

Make argument_vector re-usable for other types.

ef662f6

Attempt to collect args into array for vectorcall

418a034

Revert "Attempt to collect args into array for vectorcall"

fc22c97

This reverts commit 418a034.

Implement vectorcall args collector

9894a5a

pre-commit fixes

73ee046

Checkpoint in moving to METH_FASTCALL

4615f8a

pre-commit fixes

1dc7b3f

Use the names tuple directly, cleaner code and less reference counting

893df5d

Fix unit test, the code now holds more references

655db8a

It cannot re-use the incoming tuple as before, because it is no longer a tuple at all. So a new tuple must be created, which then holds references for each member.

Make clangtidy happy

c43b156

b-pass marked this pull request as draft December 26, 2025 13:25

b-pass and others added 4 commits December 26, 2025 08:51

Oops, _v is C++14

d881a7b

style: pre-commit fixes

76831ec

Minor code cleanup

0285818

Fix signed conversions

cc2b5a9

b-pass requested a review from oremanj December 26, 2025 16:52

b-pass marked this pull request as ready for review December 26, 2025 16:52

b-pass and others added 4 commits December 26, 2025 17:59

Fix args expansion

25485b3

This would be easier with `if constexpr`

style: pre-commit fixes

6c6bd71

Code cleanup

afd6299

Merge branch 'master' into b-pass→vectorcall

b8ba02d

rwgk requested a review from henryiii as a code owner December 27, 2025 05:00

tests: Add verbose flag to CIBW pytest command

7148a4a

Add `-v` to the pytest command in tests/pyproject.toml to help diagnose hanging tests in CIBW jobs (particularly iOS). This will show each test name as it runs, making it easier to identify which specific test is hanging.

b-pass added 4 commits December 28, 2025 13:54

Combine temp storage and args into one vector

c163bf2

It's a good bit faster at the cost of this one scary reinterpret_cast.

Phrasing

34fd03b

Delete incorrect comment

1d539c2

At 6, the struct is 144 bytes (not 128 bytes as the comment said).

Fix push_back

eea6651

Skylion007 reviewed Dec 28, 2025

View reviewed changes

include/pybind11/detail/argument_vector.h Outdated Show resolved Hide resolved

include/pybind11/detail/argument_vector.h Show resolved Hide resolved

b-pass and others added 4 commits December 28, 2025 18:04

Update push_back in argument_vector.h

5059cea

Co-authored-by: Aaron Gokaslan <[email protected]>

style: pre-commit fixes

8ad086d

Use real types for these instead of object

a78a398

They can be null if you "steal" a null handle.

Merge branch 'master' into b-pass→vectorcall

e09c222

Skylion007 reviewed Dec 30, 2025

View reviewed changes

redblackbst added a commit to redblackbst/pybind11 that referenced this pull request Jan 5, 2026

Merge PR pybind#5948: vectorcall

e90dadb

Skylion007 approved these changes Jan 6, 2026

View reviewed changes

b-pass merged commit 10f8708 into pybind:master Jan 6, 2026
151 of 152 checks passed

github-actions bot added the needs changelog Possibly needs a changelog entry label Jan 6, 2026

b-pass mentioned this pull request Jan 6, 2026

Use vectorcall for all-positional-argument calls #5896

Closed

XuehaiPan mentioned this pull request Jan 7, 2026

Appease MSVC Warning C4866: compiler may not enforce left-to-right evaluation order #5953

Merged

	m_args.emplace_back(a.value);
	m_args.emplace_back(std::move(a.value));

	void push_back_steal(PyObject *p) { m_ptrs.push_back(p); }
	void push_back_steal(object &&o) { m_ptrs.push_back(o.release().ptr()); }

	m_args.push_back_borrow(a.value.ptr());
	m_args.push_back_steal(reinterpret_borrow<object>(a.value));

Change function calls to use vectorcall #5948

Change function calls to use vectorcall #5948

Conversation

b-pass commented Dec 26, 2025

Description

Suggested changelog entry:

Uh oh!

b-pass commented Dec 26, 2025

Uh oh!

rwgk commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwgk commented Dec 27, 2025

Uh oh!

rwgk commented Dec 27, 2025

Uh oh!

rwgk commented Dec 27, 2025

NumPy 2.3.5 vs 2.4.0 Observations

The Issue

What We Know For Sure

What's Undetermined

Resolution

Uh oh!

Skylion007 commented Dec 28, 2025

Uh oh!

b-pass commented Dec 28, 2025

Uh oh!

Skylion007 commented Dec 28, 2025

Uh oh!

Skylion007 left a comment

Choose a reason for hiding this comment

Uh oh!

Skylion007 Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

b-pass commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwgk commented Dec 29, 2025

Uh oh!

b-pass commented Dec 29, 2025

Uh oh!

Skylion007 Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Skylion007 Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Skylion007 Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 commented Dec 30, 2025

Uh oh!

Uh oh!

Skylion007 Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rwgk commented Jan 6, 2026

Uh oh!

Skylion007 commented Jan 6, 2026

Uh oh!

Uh oh!

rwgk commented Jan 6, 2026

Uh oh!

Reviewers

rwgk commented Dec 27, 2025 •

edited

Loading

b-pass commented Dec 28, 2025 •

edited

Loading

Skylion007 left a comment •

edited

Loading

Skylion007 Dec 30, 2025 •

edited

Loading