Skip to content

JIT: Inline ZEND_BW_NOT#22560

Open
millerphp wants to merge 2 commits into
php:PHP-8.4from
millerphp:gh-22559-jit-inline-bw-not
Open

JIT: Inline ZEND_BW_NOT#22560
millerphp wants to merge 2 commits into
php:PHP-8.4from
millerphp:gh-22559-jit-inline-bw-not

Conversation

@millerphp

Copy link
Copy Markdown

JIT: inline ZEND_BW_NOT (fixes tracing-JIT miscompilation of ~ in a masked expression)

Fixes #22559.

Summary

Under the tracing JIT, the bitwise-NOT operator ~ is not inlined - it is emitted as a call to the ZEND_BW_NOT_SPEC VM helper. In a side trace, the helper's result is written to a stack slot that can alias a spilled, loop-carried CV, silently clobbering that CV. The result is a wrong value for code as ordinary as a flags computation, e.g. an integer that is provably a byte comes out negative.

This PR inlines ZEND_BW_NOT for the LONG case (as x ^ -1), the same way ZEND_BW_OR/AND/XOR are already inlined. That removes the helper call and its temporary result slot, so the collision can't happen - and it's a small perf
win. A regression test is included.

Interpreter and function-JIT were always correct; only the tracing JIT was wrong.

1. Symptom

<?php
final class C {
    /** @var int[] */ public array $t = [];
    public int $a = 0;
    public int $f = 0;
    public function __construct() { for ($i = 0; $i < 256; $i++) $this->t[$i] = $i & 0xA8; }
    public function add8(int $value, int $carry): void {
        $a = $this->a;
        $total = $a + $value + $carry;
        $result = $total & 0xFF;
        // Every OR term is masked to a byte, so $this->f is provably in 0..255.
        $this->f = $this->t[$result]
            | (($total & 0x100) ? 0x01 : 0)
            | (((($a & 0x0F) + ($value & 0x0F) + $carry) & 0x10) ? 0x10 : 0)
            | ((~($a ^ $value) & ($a ^ $result) & 0x80) ? 0x04 : 0); // <- masked to bit 7
        $this->a = $result;
    }
}
$c = new C();
for ($i = 0; $i < 100000; $i++) {
    $c->a = $i & 0xFF;
    $c->add8(($i >> 8) & 0xFF, 0);
    if (($c->f & ~0xFF) !== 0) {                 // $f must never have bits above the low byte
        printf("MISCOMPILED at i=%d: \$f = %d (0x%X)\n", $i, $c->f, $c->f);
        break;
    }
}
echo "done\n";
$ php -d opcache.jit=tracing  …   MISCOMPILED at i=638: $f = -105 (0x…FF97)
$ php -d opcache.jit=function …   done      # correct
$ php -d opcache.enable_cli=0  …   done      # correct

Every operand of the | chain is byte-sized (the ~…&0x80 term is 0 or 0x04), so $f is provably in 0..0xBD. Under the tracing JIT it becomes negative - the correct low byte (0x97) with all upper bits set, i.e. a full-width ~ value.

2. Research — pinning the affected versions

Built each branch from git and tested with the JIT verified active
(opcache_get_status()['jit']['on'] === true):

PHP JIT backend tracing JIT
8.3 DynASM PASS (not affected)
8.4 IR FAIL
8.5 IR FAIL
8.6 IR FAIL

So this is an IR-JIT regression, introduced with the IR JIT in 8.4 - 8.3's DynASM JIT is fine. (Two methodology notes that cost us time and are worth recording: a --disable-all build omits opcache on optional-opcache versions, so the JIT never runs and the repro falsely passes — always verify jit.on; and the bug needs opcache.jit_hot_side_exit at its default (8), because the harness default of 1 compiles side traces so eagerly the buggy one never forms.)

This is not #22115 (loop-PHI CV register dropped from a side-exit SNAPSHOT). Building that fix resolves its reproducer but leaves this one failing - same subsystem, different defect.

3. Proof — from the disassembly

The buggy side trace (opcache.jit_debug + capstone), annotated:

movq  %r13, 0xa0(%r14)      ; spill the loop-carried $f accumulator -> frame slot 0xa0
orq   $0x10, %r13
movq  %rbx, %rax
xorq  0x50(%r14), %rax      ; ($a ^ $value)
movq  %rax, 0xc0(%r14)      ; stage it as the '~' operand
callq ZEND_BW_NOT_SPEC_TMPVARCV_HANDLER   ; '~' is a HELPER CALL; result -> slot 0xa0
xorq  %r12, %rbx            ; ($a ^ $result)
leaq  0xa0(%r14), %rdx
andq  (%rdx), %rbx          ; & the '~' result
testq $0x80, %rbx
je    jit$$trace_exit_0
movq  0xa0(%r14), %r13      ; reload $f FROM 0xa0 — but 0xa0 now holds ~($a^$value)!
orq   $0x14, %r13
movq  %r13, (%rdx)          ; store the corrupted $f

The register allocator assigned the same frame slot 0xa0 to (a) the spilled loop-carried CV holding $f and (b) the result of the ZEND_BW_NOT helper call. The helper result clobbers the spilled $f; the reload then loads ~($a^$value)

  • exactly the observed negative value. (IR confirms it: the CV RLOAD … {%r13} # BIND(0xa0) shares slot 0xa0 with the BW_NOT operand/result.)

This is ~-specific because ~ is the only bitwise op the JIT does not inline; AND/OR/XOR are inlined, so (x ^ 0x80) in place of ~x avoids the helper and works - which is the one-operator source-level workaround.

4. The fix

Inline ZEND_BW_NOT for the LONG case, mirroring the other bitwise ops:

  • ext/opcache/jit/zend_jit_ir.c - new zend_jit_bw_not(): emits res = ir_XOR_L(op1, -1) (~x == x ^ -1), stores the LONG result. No helper, no temp slot.
  • ext/opcache/jit/zend_jit.c - function-JIT dispatch: case ZEND_BW_NOT (definitely-LONG fast path; otherwise falls through to the VM handler as before).
  • ext/opcache/jit/zend_jit_trace.c - tracing-JIT codegen dispatch case ZEND_BW_NOT (with CHECK_OP1_TRACE_TYPE()), plus ZEND_BW_NOT added to the ADD_OP1_TRACE_GUARD() group so the trace guards op1's type.

55 insertions, no deletions. The inlined path is taken only when op1 is definitely LONG; any other type keeps the previous helper-based behaviour.

5. Why this fix

  • It's the natural fix: BW_NOT is the only bitwise operator not inlined by the JIT (BW_OR/AND/XOR already are). Inlining it removes the helper call entirely - so there is no result slot to alias a spilled CV - and it's a small performance win, not just a correctness patch.
  • Minimal and low-risk: It reuses the existing, well-tested inlined-XOR path (ir_XOR_L), touches only the JIT, and gates on a known-LONG operand; anything else is unchanged.
  • Alternative considered: fixing the trace register allocator so a helper-call result can never alias a live spilled CV (zend_jit_trace_allocate_registers). That is more general but deeper and riskier; inlining BW_NOT fixes the observed
    bug directly and is the more upstream-friendly change. (Maintainers may of course prefer to also harden the allocator.)

Testing

  • New regression test ext/opcache/tests/jit/gh22558.phpt (overrides opcache.jit_hot_side_exit back to 8 so the bug reproduces under run-tests).
  • Repro now prints ok; the ~ result is byte-identical between the tracing JIT and the interpreter across -300..300 and edge values (PHP_INT_MIN/MAX).
  • Full ext/opcache/tests suite on 8.4: 883 tests, 0 failed (853 passed, 29 skipped).

Under the tracing JIT, ~ (ZEND_BW_NOT) was not inlined and fell back to the ZEND_BW_NOT_SPEC helper, whose result could be allocated to a stack slot aliasing a spilled loop-carried CV, clobbering it and producing a wrong
result. Inline ZEND_BW_NOT for the LONG case (x ^ -1) like the other bitwise ops, removing the helper call and its temporary slot.
@iliaal

iliaal commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Confirmed the miscompile on PHP-8.4 with JIT on (MISCOMPILED at i=638: $f=-105). The inline is correct and mirrors the BW_OR/AND/XOR path.

Test is misnumbered though: the file and --TEST-- say GH-22558, but this fixes #22559 (#22558 is the unrelated Symfony FPM segfault). Rename to gh22559.phpt.

The root cause is more general than BW_NOT, worth flagging before settling on per-opcode inlining. The ~ helper goes through zend_jit_trace_handler as a bare ir_CALL(handler, FP), and its write to the result EX_VAR isn't modeled in IR, only register clobbers are. Bound spill slots get shared across SSA versions of the same temp with no interference check, so the RA spills a still-live value (#5.T5, kept alive across the ~ by OR-folding) to slot 0xa0 and the helper overwrites it. Inlining makes the result a real IR def and fixes this case, but any un-inlined opcode reusing a live temp slot can hit the same aliasing; BW_NOT's just first because it's the only un-inlined bitwise op.

The inline's worth taking on its own. The open question is whether to model the helper's EX_VAR write at the source and close the whole class; the anti-dependency case is already stubbed in zend_jit_spilling_may_cause_conflict ("should be anti-dependent with the following stores ???").

@millerphp

Copy link
Copy Markdown
Author

Confirmed the miscompile on PHP-8.4 with JIT on (MISCOMPILED at i=638: $f=-105). The inline is correct and mirrors the BW_OR/AND/XOR path.

Awesome - glad we can both see the problem.

Test is misnumbered though: the file and --TEST-- say GH-22558, but this fixes #22559 (#22558 is the unrelated Symfony FPM segfault). Rename to gh22559.phpt.

Agreed - my bad! i blame tired fingers! correction in the latest commit.

The root cause is more general than BW_NOT, worth flagging before settling on per-opcode inlining. The ~ helper goes through zend_jit_trace_handler as a bare ir_CALL(handler, FP), and its write to the result EX_VAR isn't modeled in IR, only register clobbers are. Bound spill slots get shared across SSA versions of the same temp with no interference check, so the RA spills a still-live value (#5.T5, kept alive across the ~ by OR-folding) to slot 0xa0 and the helper overwrites it. Inlining makes the result a real IR def and fixes this case, but any un-inlined opcode reusing a live temp slot can hit the same aliasing; BW_NOT's just first because it's the only un-inlined bitwise op.

Interesting! I hadn't spotted that the EX_VAR write itself is unmodeled; I'd been looking at it purely as a spill-slot collision and stopped at "inlining gives the result a real def." Your framing is clearly the correct one: the bare ir_CALL(handler, FP) only exposing register clobbers, so the RA shares a slot across SSA versions of a live temp with no interference check, is the actual root cause, and you're right that BW_NOT is just the first to hit it because it's the only un-inlined bitwise op. Any other un-inlined opcode reusing a live temp slot is exposed to the same aliasing.

So I'm in full agreement that there's a wider fix here, and that the real close is modeling the helper's EX_VAR write at the source (or resolving the anti-dependency that's already stubbed in zend_jit_spilling_may_cause_conflict - the "should be anti-dependent with the following stores ???" note is pointing right at it).

My intent with this PR is deliberately to keep it minimal: a targeted, low-risk fix that closes the reported miscompile and mirrors the existing BW_OR/AND/XOR inlining, rather than reaching into the RA's aliasing model in the same change. I'd suggest taking this as an interim bugfix while the broader issue is addressed separately.

I'm very happy to dig into the wider fix if that'd be useful - take a proper look at whether modeling the EX_VAR write / the stubbed anti-dependency is the cleaner path, and scope what it'd take. Glad to open a separate issue to track it so it doesn't get lost behind this one. Just let me know which direction you'd prefer.

The inline's worth taking on its own. The open question is whether to model the helper's EX_VAR write at the source and close the whole class; the anti-dependency case is already stubbed in zend_jit_spilling_may_cause_conflict ("should be anti-dependent with the following stores ???").

Chased this down through zend_jit_def_reg and I think I see exactly what you mean. The binding to the CV's home slot happens via ir_bind(ctx, -EX_NUM_TO_VAR(var), val), gated on zend_jit_spilling_may_cause_conflict() returning false — and the commented-out pinned-LOAD branch (1314–1323) with the should be anti-dependent with the following stores ??? TODO is sitting right on the hazard. Since the helper's write to the result EX_VAR is never emitted into IR, there's no store node for the binding logic to be anti-dependent against, so from IR's point of view the slot looks free to reuse for a value that's still live across the call. The register clobbers are modeled, the memory write isn't, and that asymmetry is the whole bug - inlining just happens to convert that phantom write into a real IR def that binding and RA can both see.

As far as I can tell there are two ways to actually close the class:

  1. Model the write at the source. Emit an IR store (or an explicit "call writes this address" memory effect) for the helper's result EX_VAR alongside the ir_CALL. That makes IR's model honest; the helper does write that slot and then the existing dependency/aliasing machinery handles it generally, with no new special case. The design question is really what the right primitive is: a plain store of an opaque def, or teaching the call node that it writes that memory location. That feels like a maintainer call on the IR side.

  2. Extend the conflict predicate. Un-stub the anti-dependency in zend_jit_spilling_may_cause_conflict so a slot binding is treated as conflicting with a following implicit store. That's narrower, but it's inferring the hazard that (1) would make explicit, and given how dense that predicate already is I'd worry about landing either too conservative (pessimistic binding on hot paths → more spills) or too narrow (leaving other un-inlined helpers exposed).

My instinct is 1 is the principled fix - restore what the machine actually does to the IR and stop special-casing the symptom - but it's exactly the kind of change I'd want maintainer's steer on before running at it, since it touches the IR memory model rather than just opcode codegen.

For this PR I'd like to keep the scope to the inline: it's minimal, low-risk, mirrors BW_OR/AND/XOR, and closes the reported miscompile. I'm very happy to take a proper run at the wider fix. I'd lean toward prototyping (1) as a separate issue/PR, with this as the interim bugfix in the meantime. Glad to open that follow-up and start digging if it'd be useful; just let me know whether you'd rather see the modeled-write approach or the predicate approach explored first (or both - i'm happy to do both and let you choose, or any other suggestions)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants