-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Change function calls to use vectorcall #5948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit 418a034.
It cannot re-use the incoming tuple as before, because it is no longer a tuple at all. So a new tuple must be created, which then holds references for each member.
This would be easier with `if constexpr`
|
@b-pass I merged master, which had no conflicts. My idea was that maybe Cursor can figure out the root cause for these two CI failures:
I'll wait for this CI run to finish, to see if these errors appear again. When testing locally with Python 3.14.2, default and freethreaded, I saw this error (with both builds): EDIT: Please see #5948 (comment) for the resolution. I'm attaching the full build & test logs, JIC. I switched back to master, everything else being equal, this error didn't appear. So it's definitely somehow connected to this PR. pybind11_gcc_v3.14_df793163d58_default_tests_log_2025-12-26+193154.txt pybind11_gcc_v3.14_df793163d58_freethreaded_log_2025-12-26+193608.txt |
The `mod_per_interpreter_gil`, `mod_shared_interpreter_gil`, and `mod_per_interpreter_gil_with_singleton` modules were being built but not installed into the wheel when using scikit-build-core (SKBUILD=true). This caused iOS (and potentially Android) CIBW tests to fail with ModuleNotFoundError. Root cause analysis: - The main test targets have install() commands (line 531) - The PYBIND11_MULTIPLE_INTERPRETERS_TEST_MODULES were missing equivalent install() commands - For regular CMake builds, this wasn't a problem because LIBRARY_OUTPUT_DIRECTORY places the modules next to pybind11_tests - For wheel builds, only targets with explicit install() commands are included in the wheel This issue was latent until commit fee2527 changed the test imports from `pytest.importorskip()` (graceful skip) to direct `import` statements (hard failure), which exposed the missing modules. Failing tests: - test_multiple_interpreters.py::test_independent_subinterpreters - test_multiple_interpreters.py::test_dependent_subinterpreters Error: ModuleNotFoundError: No module named 'mod_per_interpreter_gil'
|
Quick update: I think Cursor found the root cause for the CIBW / iOS failures, and a fix. I'm still looking into why we're not seeing the |
Add numpy==2.4.0 requirement for Python 3.14 (both default and free-threaded builds). NumPy 2.4.0 is the first version to provide official PyPI wheels for Python 3.14: - numpy-2.4.0-cp314-cp314-manylinux_2_27_x86_64...whl (default) - numpy-2.4.0-cp314-cp314t-manylinux_2_27_x86_64...whl (free-threaded) Previously, CI was skipping all numpy-dependent tests for Python 3.14 because PIP_ONLY_BINARY was set and no wheels were available: SKIPPED [...] test_numpy_array.py:8: could not import 'numpy': No module named 'numpy' With this change, the full numpy test suite will run on Python 3.14, providing better test coverage for the newest Python version. Note: Using exact pin (==2.4.0) rather than compatible release (~=2.4.0) to ensure reproducible CI results with the first known-working version.
NumPy 2.3.5 vs 2.4.0 ObservationsThe IssueWhen testing this PR locally with Python 3.14 + NumPy 2.3.5, I observed a test failure: This occurred when calling What We Know For Sure
What's UndeterminedThe exact bug fix in NumPy between 2.3.5 and 2.4.0 that resolved this issue. I searched the NumPy 2.4.0 release notes but couldn't find a specific fix that clearly matches our observation. The release notes mention "many expired deprecations and bug fixes" without listing all of them individually. ResolutionAdded
|
Add `-v` to the pytest command in tests/pyproject.toml to help diagnose hanging tests in CIBW jobs (particularly iOS). This will show each test name as it runs, making it easier to identify which specific test is hanging.
|
Random design question, why does the collector store pyobject pointers directly vs py::handle? |
It's a good bit faster at the cost of this one scary reinterpret_cast.
At 6, the struct is 144 bytes (not 128 bytes as the comment said).
Fixed.
Not sure what you mean... elaborate?
It was to avoid having to reinterpret_cast the C++ type to the C type in the Vectorcall call.... But I've consolidated both the |
Avoiding the reintrepret_cast is a valid reason, just curious. |
Skylion007
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused why a bunch of list and tuple objects are being stored as generic objects? We don't do anything crazy to them and they all inherit py;:object. The only different is the implicit boolean checks maybe but those can be easily over by unwrapping the ptr any time and checking against nullptr
include/pybind11/cast.h
Outdated
| if (PyList_Append(names_list.ptr(), name.release().ptr()) < 0) { | ||
| throw error_already_set(); | ||
| } | ||
| m_args.emplace_back(a.value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| m_args.emplace_back(a.value); | |
| m_args.emplace_back(std::move(a.value)); |
It's just a bit inconvenient in this class ... I can certainly add some explanation in a comment about it. EDIT: Turns out there is a way: |
Co-authored-by: Aaron Gokaslan <[email protected]>
They can be null if you "steal" a null handle.
…icit ownership Introduce `ref_small_vector` to manage PyObject* references in `unpacking_collector`, replacing the previous `small_vector<object>` approach. Primary goals: 1. **Maintainability**: The previous implementation relied on `sizeof(object) == sizeof(PyObject*)` and used a reinterpret_cast to pass the object array to vectorcall. This coupling to py::object's internal layout could break if someone adds a debug field or other member to py::handle/py::object in the future. 2. **Readability**: The new `push_back_steal()` vs `push_back_borrow()` API makes reference counting intent explicit at each call site, rather than relying on implicit py::object semantics. 3. **Intuitive code**: Storing `PyObject*` directly and passing it to `_PyObject_Vectorcall` without casts is straightforward and matches what the C API expects. No "scary" reinterpret_cast needed. Additional benefits: - `PyObject*` is trivially copyable, simplifying vector operations - Batch decref in destructor (tight loop vs N individual object destructors) - Self-documenting ownership semantics Design consideration: We considered folding the ref-counting functionality directly into `small_vector` via template specialization for `PyObject*`. We decided against this because: - It would give `small_vector<PyObject*, N>` a different interface than the generic `small_vector<T, N>` (steal/borrow vs push_back) - Someone might want a non-ref-counting `small_vector<PyObject*, N>` - The specialization behavior could surprise users expecting uniform semantics A separate `ref_small_vector` type makes the ref-counting behavior explicit and self-documenting, while keeping `small_vector` generic and predictable.
I like it. It's less code than I thought it would be. |
| if (m_names) { | ||
| nargs -= m_names.size(); | ||
| } | ||
| PyObject *result = _PyObject_Vectorcall( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight nit could have been a py::handle
| /// destructor, and avoids the need for reinterpret_cast when passing to vectorcall. | ||
| template <std::size_t InlineSize> | ||
| class ref_small_vector { | ||
| public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, I don't like this, but I don't see a better way of doing due to layout compatibiltiy issues with py::object. Every other method of doing this requires a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you thought of this, but if you make push_back only take in objects, you can handle the reinterpret steal and reinterpret borrow using only move and release mechanics.
| } | ||
|
|
||
| /// Add a pointer, taking ownership (no incref, will decref on destruction) | ||
| void push_back_steal(PyObject *p) { m_ptrs.push_back(p); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void push_back_steal(PyObject *p) { m_ptrs.push_back(p); } | |
| void push_back_steal(object &&o) { m_ptrs.push_back(o.release().ptr()); } |
| detail::make_caster<T>::cast(std::forward<T>(x), policy, {})); | ||
| if (!o) { | ||
| void process(list & /*names_list*/, T &&x) { | ||
| handle h = detail::make_caster<T>::cast(std::forward<T>(x), policy, {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting this to a handle here makes me a bit uncomfortable. You could keep it in an object and it just use the move from ctor, the only downside is you'll have one extra call to PYXDECREF on a nullptr.
|
I like how you have it done now, I just wonder if it could be clenaed up anymore with .release().ptr() semantics. |
| } | ||
| m_kwargs[a.name] = std::move(a.value); | ||
| names_list.append(std::move(name)); | ||
| m_args.push_back_borrow(a.value.ptr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah both of these could just be a push_back(&&o) with a release ptr syntax and then you don't have to duplicate the logic and while the implementation is py::objects the input is still py::object. The one downside is the extra xdecref call on the nullptr stored in the py::object, but doubt that nullcheck will be a bottleneck?
| m_args.push_back_borrow(a.value.ptr()); | |
| m_args.push_back_steal(reinterpret_borrow<object>(a.value)); |
|
@b-pass I'm in favor of merging based on @Skylion007 and my reviews. We could then work on the v3.0.2 release. (I could jump-start that.) |
|
Might be enough for a minor version bump to 3.1, no? |
Not sure, maybe let's look at the changelog PR first? There are two PRs in a holding pattern, for those we need to bump the minor version for sure (IMO): |

Description
Change both the Python->C++ and the C++->Python function calls to use the faster "vector call" calling protocol.
This makes function calls faster, at the minor expense of having to handle the arguments slightly differently.
Suggested changelog entry: