cranelift(x64): lower bare ctz/clz boolean tests via test+CC#13334
cranelift(x64): lower bare ctz/clz boolean tests via test+CC#13334ggreif wants to merge 1 commit into
ctz/clz boolean tests via test+CC#13334Conversation
Follow-up to bytecodealliance#13332. That PR added egraph rules collapsing `(eq (ctz X) 0)` / `(ne (ctz X) 0)` / clz analogues to direct LSB / sign-bit tests — but only when the comparison is mediated by an explicit `icmp`. The wasm front-end translates `wasm if (ctz X)` to `brif (ireduce.i32 (ctz.i64 X))` directly (no `icmp`), so the egraph rules don't fire on the wasm-natural shape. This commit closes the gap by specialising `is_nonzero` in the x64 backend — the helper that all `brif`/`select`/`trapif` lowerings funnel through. Four rules: `ctz`/`clz` × bare/`ireduce`-wrapped. The `ireduce` variant catches the wasm front-end's `i32.wrap_i64` over a 64-bit `ctz`/`clz` — a no-op on values in [0, bitwidth]. Test deltas (tests/disas/ctz-clz-bool-condition.wat): if_ctz_bare_i32: 5 insns -> 2 (testl $1, %edx; je) if_ctz_bare_i64: 5 insns -> 2 (testq $1, %rdx; je) if_clz_bare_i32: 7 insns -> 2 (testl %edx, %edx; jns) The icmp-mediated cases (collapsed by bytecodealliance#13332's egraph rules) are unchanged. The numeric-comparison negative test stays untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ctz/clz boolean tests via test+CC
ctz/clz boolean tests via test+CCctz/clz boolean tests via test+CC
|
@cfallin or @fitzgen do y'all have any ideas about how to sort of deduplicate this with the optimization rules landed in #13332? It feels a bit unfortunate that we need basically the same rules twice, once for general expressions and once because |
|
Left a comment over here; putting these rules in the backend is definitely the wrong place IMHO, as aside from the software-engineering aspects (repetition) we want these simplifications to compose with other mid-end opts when possible. I agree it'd be great to find a way to factor out simplifications for all "non-zero test" cases in the mid-end. I suppose we could define a helper |
|
Following on from @cfallin's
For the LSB case I'll try the |
|
Update — the Approach (~50 lines total):
Results on the 2-op
All 70 cranelift egraph filetests pass. The I'll push the mid-end branch as a new PR, with revert commits for this PR and #13336. Closing once that lands cleanly. |
|
Closing in favor of #13343 — the mid-end |
…keleton` The mid-end rules added in bytecodealliance#13332 hinge on an `icmp eq/ne (ctz/clz X) 0` shape — i.e. the wasm 3-op pattern `i32.ctz; i32.eqz; br_if`. Frontends that emit the 2-op form `i32.ctz; br_if` (e.g. Motoko's `moc` after its `and 1; eqz; br_if` → `ctz; br_if` byte-size peephole) feed `(brif (ctz X))` into cranelift with no `icmp` for the existing rules to match. This commit extends `simplify_skeleton` to rewrite the *condition operand* of an existing `brif` in place, without touching its opcode or successor blocks (CFG-preserving by construction). A new `SkeletonInstSimplification` variant `ReplaceBranchCond(Value)` carries the new condition; the egraph driver applies it by writing through `inst_args_mut`. Two ISLE rules in `opts/icmp.isle` rewrite `(brif (ctz X) bt be)` and `(brif (clz X) bt be)` to brifs over the equivalent bit-extract form: brif (ctz X) bt be → brif (eq (band X 1) 0) bt be brif (clz X) bt be → brif (sge X 0) bt be End-to-end lowering on the resulting brif then composes with existing backend `icmp+brif` fusion to produce: x86_64 brif (ctz X): `testl $1, %edi; je` x86_64 brif (clz X): `testl %edi, %edi; jge` aarch64 brif (ctz X): `tbz w0, #0` — single-instruction test-and-branch This subsumes the backend-side x64 rules added in bytecodealliance#13334 and the aarch64 rules in bytecodealliance#13336 (and yields tighter aarch64 code than bytecodealliance#13336 did). The driver still rejects non-`brif` branches and rejects non-`ReplaceBranchCond` simplification variants on `brif` (a `Replace inst` of a brif would risk changing successor block IDs and is left to a future, broader extension). Filetest `egraph/brif-cnt-cond.clif` covers ctz/clz over i32/i64 in the 2-op form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…keleton` (bytecodealliance#13343) * cranelift: fold `ctz`/`clz` directly into `brif` cond via `simplify_skeleton` The mid-end rules added in bytecodealliance#13332 hinge on an `icmp eq/ne (ctz/clz X) 0` shape — i.e. the wasm 3-op pattern `i32.ctz; i32.eqz; br_if`. Frontends that emit the 2-op form `i32.ctz; br_if` (e.g. Motoko's `moc` after its `and 1; eqz; br_if` → `ctz; br_if` byte-size peephole) feed `(brif (ctz X))` into cranelift with no `icmp` for the existing rules to match. This commit extends `simplify_skeleton` to rewrite the *condition operand* of an existing `brif` in place, without touching its opcode or successor blocks (CFG-preserving by construction). A new `SkeletonInstSimplification` variant `ReplaceBranchCond(Value)` carries the new condition; the egraph driver applies it by writing through `inst_args_mut`. Two ISLE rules in `opts/icmp.isle` rewrite `(brif (ctz X) bt be)` and `(brif (clz X) bt be)` to brifs over the equivalent bit-extract form: brif (ctz X) bt be → brif (eq (band X 1) 0) bt be brif (clz X) bt be → brif (sge X 0) bt be End-to-end lowering on the resulting brif then composes with existing backend `icmp+brif` fusion to produce: x86_64 brif (ctz X): `testl $1, %edi; je` x86_64 brif (clz X): `testl %edi, %edi; jge` aarch64 brif (ctz X): `tbz w0, #0` — single-instruction test-and-branch This subsumes the backend-side x64 rules added in bytecodealliance#13334 and the aarch64 rules in bytecodealliance#13336 (and yields tighter aarch64 code than bytecodealliance#13336 did). The driver still rejects non-`brif` branches and rejects non-`ReplaceBranchCond` simplification variants on `brif` (a `Replace inst` of a brif would risk changing successor block IDs and is left to a future, broader extension). Filetest `egraph/brif-cnt-cond.clif` covers ctz/clz over i32/i64 in the 2-op form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * rustfmt: collapse `is_branch` && opcode-guard onto one line * tests/disas: re-bless ctz/clz-bool-condition for new mid-end fold The new `simplify_skeleton`-on-`brif` rule rewrites the 2-op `if (ctz/clz x)` cases that bytecodealliance#13332's commentary noted were the non-icmp-mediated holdouts. Bare-form lowering shrinks from ~9 instructions (bsf/bsr + cmov + test + jne + …) to `testl $1, %edx; je` (ctz) and `testl %edx, %edx; jge` (clz). Offsets on the subsequent non-bare functions shift down to match. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Follow-up to #13332. That PR added egraph rules collapsing
(eq (ctz X) 0)/(ne (ctz X) 0)/(eq (clz X) 0)/(ne (clz X) 0)to direct LSB / sign-bit tests — but only when the comparison is mediated by an expliciticmp. The wasm front-end translateswasm if (ctz X)tobrif (ireduce.i32 (ctz.i64 X))directly (noicmp), so the egraph rules don't fire on the wasm-natural shape.This PR closes the gap by specialising
is_nonzeroin the x64 backend — the helper that allbrif/select/trapiflowerings funnel through.Rules
In
cranelift/codegen/src/isa/x64/inst.isle:The
ireducevariant catches the wasm front-end'si32.wrap_i64over a 64-bitctz/clz— a no-op on values in [0, bitwidth].Test deltas (
tests/disas/ctz-clz-bool-condition.wat)if_ctz_bare_i32bsfl + cmovel + test + jne)testl $1, %edx; je)if_ctz_bare_i64bsfq + cmovq + test + jne)testq $1, %rdx; je)if_clz_bare_i32bsr + cmov + sub + test + jne)testl + jns)The icmp-mediated cases (collapsed by #13332's egraph rules) are unchanged. The numeric-comparison negative test (
(ctz X) == 4) stays untouched.Motivation
Motoko's
moccodegen emitsi64.ctz X; i32.wrap_i64; iffor compactness/sign tests in the EOP backend (see caffeinelabs/motoko#6103). Before this PR, that lowers to 5 native instructions per dispatch; after, 2.A concrete idiomatic example: in Motoko, the
let-elsepattern overResultdesugars to a 2-arm refutable variant match (
#okvs#err). The variant-tag hashes arehash("ok") = 0x611C(LSB 0) andhash("err") = 0x4D0765(LSB 1) — they differ exactly at the LSB. The planned variant-switchBitTestdispatch (caffeinelabs/motoko'sgabor/variant-switch) recognizes this and emits a single LSB-test for the dispatch; combined with this PR, the entire let-else lowers toload hash; testq $1, ...; jccon x64 — three instructions for a pattern match. EveryResult-returning API + everylet-else-style early return collapses to this shape.Aggregated across hot paths (variant-switch dispatch, GC compact/heap discriminator, sign tests, …) this is meaningful.
Follow-ups (not in this PR)
select-consumer variant —selectalready routes throughis_nonzero_cmp→is_nonzero, so this PR's rules cover it too without extra work.