cranelift: fold ctz/clz directly into brif cond via simplify_skeleton#13343
Conversation
|
Cross-backend confirmation that this mid-end change is a strict win on every target wasmtime supports — no per-backend rules needed:
Notes:
So the cost-vs-benefit picture for landing this PR vs the two backend PRs (#13334 / #13336): one ~50-line mid-end change covers 4 ISAs simultaneously, with the aarch64 case getting strictly better code than dedicated backend rules would produce. |
|
@ggreif it looks like you'll need to re-bless a test; and also run |
f9256de to
8341555
Compare
|
(I saw your push to fix the formatting; the test-blessing failure is here and should be fixable with |
|
Ah, and now there's a merge conflict -- sorry for the merging troubles, @ggreif! If you could fix that, I'll be happy to merge. |
…keleton` The mid-end rules added in bytecodealliance#13332 hinge on an `icmp eq/ne (ctz/clz X) 0` shape — i.e. the wasm 3-op pattern `i32.ctz; i32.eqz; br_if`. Frontends that emit the 2-op form `i32.ctz; br_if` (e.g. Motoko's `moc` after its `and 1; eqz; br_if` → `ctz; br_if` byte-size peephole) feed `(brif (ctz X))` into cranelift with no `icmp` for the existing rules to match. This commit extends `simplify_skeleton` to rewrite the *condition operand* of an existing `brif` in place, without touching its opcode or successor blocks (CFG-preserving by construction). A new `SkeletonInstSimplification` variant `ReplaceBranchCond(Value)` carries the new condition; the egraph driver applies it by writing through `inst_args_mut`. Two ISLE rules in `opts/icmp.isle` rewrite `(brif (ctz X) bt be)` and `(brif (clz X) bt be)` to brifs over the equivalent bit-extract form: brif (ctz X) bt be → brif (eq (band X 1) 0) bt be brif (clz X) bt be → brif (sge X 0) bt be End-to-end lowering on the resulting brif then composes with existing backend `icmp+brif` fusion to produce: x86_64 brif (ctz X): `testl $1, %edi; je` x86_64 brif (clz X): `testl %edi, %edi; jge` aarch64 brif (ctz X): `tbz w0, #0` — single-instruction test-and-branch This subsumes the backend-side x64 rules added in bytecodealliance#13334 and the aarch64 rules in bytecodealliance#13336 (and yields tighter aarch64 code than bytecodealliance#13336 did). The driver still rejects non-`brif` branches and rejects non-`ReplaceBranchCond` simplification variants on `brif` (a `Replace inst` of a brif would risk changing successor block IDs and is left to a future, broader extension). Filetest `egraph/brif-cnt-cond.clif` covers ctz/clz over i32/i64 in the 2-op form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new `simplify_skeleton`-on-`brif` rule rewrites the 2-op `if (ctz/clz x)` cases that bytecodealliance#13332's commentary noted were the non-icmp-mediated holdouts. Bare-form lowering shrinks from ~9 instructions (bsf/bsr + cmov + test + jne + …) to `testl $1, %edx; je` (ctz) and `testl %edx, %edx; jge` (clz). Offsets on the subsequent non-bare functions shift down to match.
cd73adf to
d5ce29b
Compare
|
Sidebar / future work for riscv64: per the cross-backend table above, the mid-end rewrite leaves a For the ctz form the For the clz form @alexcrichton — flagging since you're touching riscv64 currently; the patterns above are stable shapes the mid-end now emits unconditionally for any |
Motivation
PR #13332 landed mid-end rules that fold
(eq/ne (ctz/clz X) 0)icmp shapes into direct bit tests onX. Those rules hinge on anicmpinterposed between the bit-counter and its consumer — i.e. the wasm 3-op patterni32.ctz; i32.eqz; br_if.Frontends that emit the 2-op form
i32.ctz; br_if(with noi32.eqzbetween them — e.g. Motoko'smoc, after itsand 1; eqz; br_if→ctz; br_ifbyte-size peephole) feed(brif (ctz X))into cranelift, with no icmp for the existing rules to match. #13334 (x64) and #13336 (aarch64) added backend lowering rules to cover that gap. As @cfallin pointed out in #13336, the backend is the wrong place — both for SWE reasons (rule duplication per ISA) and because we want these simplifications to compose with other mid-end opts.Approach
This PR extends
simplify_skeletonto rewrite the condition operand of an existingbrifin place. The CFG is preserved by construction: the opcode and successor blocks stay; only argument 0 changes.Concretely:
New
SkeletonInstSimplification::ReplaceBranchCond(Value)variant inprelude_opt.isle— a narrow rewrite that carries just the new cond value.Driver patch in
cranelift/codegen/src/egraph/mod.rs: handle the new variant — in the cost-loop, accept it eagerly (no cost ranking against opcode-preserving rewrites); in the apply site, swap argument 0 in place viainst_args_mut. Composes with Cranelift: Rewrite conditional branches with constant conditions into unconditional jumps #13267's existing branch-simplification machinery; no guard relaxation needed.replace_branch_condconstructor inprelude_opt.isle.Two ISLE rules in
opts/icmp.isle:Effect
On the 2-op
brif (ctz X)/brif (clz X)patterns:brif (ctz X)testl $1, %edi; je✓ (matches #13334's x64 backend rules)brif (clz X)testl %edi, %edi; jge✓ (matches #13336's intent)brif (ctz X)tbz w0, #0— single-instruction test-and-branch, tighter than #13336'stst+cmp+b.ccTest
New filetest
cranelift/filetests/filetests/egraph/brif-cnt-cond.clifcovers ctz/clz over i32/i64 in the 2-opbrif-direct form. All cranelift egraph filetests pass;tests/disas/ctz-clz-bool-condition.watre-blessed (the bare-form cases now collapse to the optimal 2-instruction shape).Supersedes
ctz/clzboolean tests viatst/cmp+Cond #13336 — this PR yields strictly better aarch64 code (tbzvstst+cmp+b.cc).ctz/clzboolean tests viatest+CC #13334 — closed once review discussion migrated here.Future work
The
ReplaceBranchCondvariant covers the in-place cond swap; #13267 covers fullbrif-to-jumprewrites for constant conditions. A natural follow-up is extending the same cond-only rewrite shape totrapnz/trapzso e.g.(trapnz (ctz x) code)collapses to the bit-test form.