JIT: Inline ZEND_BW_NOT#22560
Conversation
Under the tracing JIT, ~ (ZEND_BW_NOT) was not inlined and fell back to the ZEND_BW_NOT_SPEC helper, whose result could be allocated to a stack slot aliasing a spilled loop-carried CV, clobbering it and producing a wrong result. Inline ZEND_BW_NOT for the LONG case (x ^ -1) like the other bitwise ops, removing the helper call and its temporary slot.
|
Confirmed the miscompile on PHP-8.4 with JIT on ( Test is misnumbered though: the file and The root cause is more general than The inline's worth taking on its own. The open question is whether to model the helper's EX_VAR write at the source and close the whole class; the anti-dependency case is already stubbed in |
Awesome - glad we can both see the problem.
Agreed - my bad! i blame tired fingers! correction in the latest commit.
Interesting! I hadn't spotted that the EX_VAR write itself is unmodeled; I'd been looking at it purely as a spill-slot collision and stopped at "inlining gives the result a real def." Your framing is clearly the correct one: the bare ir_CALL(handler, FP) only exposing register clobbers, so the RA shares a slot across SSA versions of a live temp with no interference check, is the actual root cause, and you're right that BW_NOT is just the first to hit it because it's the only un-inlined bitwise op. Any other un-inlined opcode reusing a live temp slot is exposed to the same aliasing. So I'm in full agreement that there's a wider fix here, and that the real close is modeling the helper's EX_VAR write at the source (or resolving the anti-dependency that's already stubbed in zend_jit_spilling_may_cause_conflict - the "should be anti-dependent with the following stores ???" note is pointing right at it). My intent with this PR is deliberately to keep it minimal: a targeted, low-risk fix that closes the reported miscompile and mirrors the existing BW_OR/AND/XOR inlining, rather than reaching into the RA's aliasing model in the same change. I'd suggest taking this as an interim bugfix while the broader issue is addressed separately. I'm very happy to dig into the wider fix if that'd be useful - take a proper look at whether modeling the EX_VAR write / the stubbed anti-dependency is the cleaner path, and scope what it'd take. Glad to open a separate issue to track it so it doesn't get lost behind this one. Just let me know which direction you'd prefer.
Chased this down through zend_jit_def_reg and I think I see exactly what you mean. The binding to the CV's home slot happens via ir_bind(ctx, -EX_NUM_TO_VAR(var), val), gated on zend_jit_spilling_may_cause_conflict() returning false — and the commented-out pinned-LOAD branch (1314–1323) with the should be anti-dependent with the following stores ??? TODO is sitting right on the hazard. Since the helper's write to the result EX_VAR is never emitted into IR, there's no store node for the binding logic to be anti-dependent against, so from IR's point of view the slot looks free to reuse for a value that's still live across the call. The register clobbers are modeled, the memory write isn't, and that asymmetry is the whole bug - inlining just happens to convert that phantom write into a real IR def that binding and RA can both see. As far as I can tell there are two ways to actually close the class:
My instinct is 1 is the principled fix - restore what the machine actually does to the IR and stop special-casing the symptom - but it's exactly the kind of change I'd want maintainer's steer on before running at it, since it touches the IR memory model rather than just opcode codegen. For this PR I'd like to keep the scope to the inline: it's minimal, low-risk, mirrors BW_OR/AND/XOR, and closes the reported miscompile. I'm very happy to take a proper run at the wider fix. I'd lean toward prototyping (1) as a separate issue/PR, with this as the interim bugfix in the meantime. Glad to open that follow-up and start digging if it'd be useful; just let me know whether you'd rather see the modeled-write approach or the predicate approach explored first (or both - i'm happy to do both and let you choose, or any other suggestions) |
JIT: inline ZEND_BW_NOT (fixes tracing-JIT miscompilation of
~in a masked expression)Fixes #22559.
Summary
Under the tracing JIT, the bitwise-NOT operator
~is not inlined - it is emitted as a call to theZEND_BW_NOT_SPECVM helper. In a side trace, the helper's result is written to a stack slot that can alias a spilled, loop-carried CV, silently clobbering that CV. The result is a wrong value for code as ordinary as a flags computation, e.g. an integer that is provably a byte comes out negative.This PR inlines
ZEND_BW_NOTfor the LONG case (asx ^ -1), the same wayZEND_BW_OR/AND/XORare already inlined. That removes the helper call and its temporary result slot, so the collision can't happen - and it's a small perfwin. A regression test is included.
Interpreter and function-JIT were always correct; only the tracing JIT was wrong.
1. Symptom
Every operand of the
|chain is byte-sized (the~…&0x80term is0or0x04), so$fis provably in0..0xBD. Under the tracing JIT it becomes negative - the correct low byte (0x97) with all upper bits set, i.e. a full-width~value.2. Research — pinning the affected versions
Built each branch from git and tested with the JIT verified active
(
opcache_get_status()['jit']['on'] === true):So this is an IR-JIT regression, introduced with the IR JIT in 8.4 - 8.3's DynASM JIT is fine. (Two methodology notes that cost us time and are worth recording: a
--disable-allbuild omits opcache on optional-opcache versions, so the JIT never runs and the repro falsely passes — always verifyjit.on; and the bug needsopcache.jit_hot_side_exitat its default (8), because the harness default of1compiles side traces so eagerly the buggy one never forms.)This is not #22115 (loop-PHI CV register dropped from a side-exit SNAPSHOT). Building that fix resolves its reproducer but leaves this one failing - same subsystem, different defect.
3. Proof — from the disassembly
The buggy side trace (
opcache.jit_debug+ capstone), annotated:The register allocator assigned the same frame slot
0xa0to (a) the spilled loop-carried CV holding$fand (b) the result of theZEND_BW_NOThelper call. The helper result clobbers the spilled$f; the reload then loads~($a^$value)RLOAD … {%r13} # BIND(0xa0)shares slot0xa0with theBW_NOToperand/result.)This is
~-specific because~is the only bitwise op the JIT does not inline;AND/OR/XORare inlined, so(x ^ 0x80)in place of~xavoids the helper and works - which is the one-operator source-level workaround.4. The fix
Inline
ZEND_BW_NOTfor the LONG case, mirroring the other bitwise ops:ext/opcache/jit/zend_jit_ir.c- newzend_jit_bw_not(): emitsres = ir_XOR_L(op1, -1)(~x == x ^ -1), stores the LONG result. No helper, no temp slot.ext/opcache/jit/zend_jit.c- function-JIT dispatch:case ZEND_BW_NOT(definitely-LONG fast path; otherwise falls through to the VM handler as before).ext/opcache/jit/zend_jit_trace.c- tracing-JIT codegen dispatchcase ZEND_BW_NOT(withCHECK_OP1_TRACE_TYPE()), plusZEND_BW_NOTadded to theADD_OP1_TRACE_GUARD()group so the trace guards op1's type.55 insertions, no deletions. The inlined path is taken only when op1 is definitely
LONG; any other type keeps the previous helper-based behaviour.5. Why this fix
BW_NOTis the only bitwise operator not inlined by the JIT (BW_OR/AND/XORalready are). Inlining it removes the helper call entirely - so there is no result slot to alias a spilled CV - and it's a small performance win, not just a correctness patch.ir_XOR_L), touches only the JIT, and gates on a known-LONGoperand; anything else is unchanged.zend_jit_trace_allocate_registers). That is more general but deeper and riskier; inliningBW_NOTfixes the observedbug directly and is the more upstream-friendly change. (Maintainers may of course prefer to also harden the allocator.)
Testing
ext/opcache/tests/jit/gh22558.phpt(overridesopcache.jit_hot_side_exitback to8so the bug reproduces underrun-tests).ok; the~result is byte-identical between the tracing JIT and the interpreter across-300..300and edge values (PHP_INT_MIN/MAX).ext/opcache/testssuite on 8.4: 883 tests, 0 failed (853 passed, 29 skipped).