$ cargo run
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.08s
Running `target/debug/repro`
thread 'main' (1071502) panicked at /home/alex/code/wasmtime2/crates/wasmtime/src/runtime/vm/traphandlers.rs:493:13:
assertion failed: self.unwind.replace(None).is_none()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Details
# `EntryStoreContext::enter_wasm` saves the *new* `stack_chain`, breaking nested embedder→wasm calls during stack switching
Scope: crates/wasmtime/src/runtime/func.rs:1587-1606
(EntryStoreContext::enter_wasm),
crates/wasmtime/src/runtime/func.rs:1614-1629
(EntryStoreContext::exit_wasm),
crates/wasmtime/src/runtime/vm/traphandlers.rs:667-696
(CallThreadState::swap).
Severity: Stack switching is currently 🚧 (work-in-progress) on
x86_64 Cranelift, so this is not a security issue per Wasmtime's tier
policy today. It is, however, a soundness bug that must be fixed
before stack switching can graduate to a stable feature: a guest that
does nothing more than call a normal host import — where the host
happens to call back into the same Store via the embedder API —
silently invalidates the running continuation's cx.stack_chain. The
same bug also affects any future stack-switching-aware fiber-suspension
path through CallThreadState::swap.
Reproducible: Yes. Test program in this folder
(enter-wasm-stack-chain-repro) exercises only public Wasmtime API
(Func::wrap, Caller::get_export, Func::typed, Func::call) plus
the gated wasm_stack_switching(true) config bit. Running with
RECURSIVE=1 produces Ok(0) instead of the expected Ok(1).
Summary
EntryStoreContext::enter_wasm is responsible for saving the
pre-entry values of various VMStoreContext fields so that
exit_wasm (run on Drop) and CallThreadState::swap (run during
async fiber suspension) can put the store back the way it was before
this wasm activation started. Five of the six fields it saves are
read without being written, so the saved value is the pre-entry
value as intended:
last_wasm_exit_pc: *(*vm_store_context).last_wasm_exit_pc.get(),
last_wasm_exit_trampoline_fp: *(*vm_store_context).last_wasm_exit_trampoline_fp.get(),
last_wasm_entry_fp: *(*vm_store_context).last_wasm_entry_fp.get(),
last_wasm_entry_sp: *(*vm_store_context).last_wasm_entry_sp.get(),
last_wasm_entry_trap_handler: *(*vm_store_context).last_wasm_entry_trap_handler.get(),
stack_chain, however, is written before being read:
unsafe {
let vm_store_context = store.0.vm_store_context();
let new_stack_chain = VMStackChain::InitialStack(initial_stack_information);
*vm_store_context.stack_chain.get() = new_stack_chain; // (1) write
Self {
…
stack_chain: (*(*vm_store_context).stack_chain.get()).clone(), // (2) read AFTER (1)
…
}
}
So Self.stack_chain ends up holding the new
InitialStack(initial_stack_information) rather than the value
stack_chain had on entry to enter_wasm. Both consumers of
Self.stack_chain (drop-time exit_wasm, async-fiber swap) write
that value back into the live vm_store_context, so any pre-entry
value (a Continuation(...) from being inside a resume body, or
the outer activation's InitialStack(...) for a nested host→wasm
call) is unrecoverable.
This is a regression from commit
192f2fcdadfec9d0cf6b58548a85a7307450cbf5 ("Replace setjmp/longjmp
usage in Wasmtime"). Before that commit enter_wasm did the read
first, then the write:
let stack_chain = (*store.0.vm_store_context().stack_chain.get()).clone(); // read FIRST
let new_stack_chain = VMStackChain::InitialStack(initial_stack_information);
*store.0.vm_store_context().stack_chain.get() = new_stack_chain; // write AFTER
…
Self { …, stack_chain, … }
The setjmp/longjmp removal commit collapsed the load+store into a
single expression but moved the store before the struct literal,
silently flipping the saved value from "previous" to "new".
What the contract is supposed to be
CallThreadState::swap (vm/traphandlers.rs:656-696) is the most
explicit description:
/// Swaps the state in this `CallThreadState`'s `VMStoreContext` with
/// the state in `EntryStoreContext` that was saved when this
/// activation was created.
///
/// This method is using during suspension of a fiber to restore the
/// store back to what it originally was and prepare it to be resumed
/// later on. … That restores a store to just before this activation
/// was called but saves off the fields of this activation to get
/// restored/resumed at a later time.
"Restores a store to just before this activation was called" can only
hold if Self.stack_chain is the value vm_store_context.stack_chain
had before enter_wasm ran. The same is true of exit_wasm,
which mirror-images swap for the synchronous path.
The reproducer
src/main.rs here uses only public Wasmtime API. The wasm module
contains:
- a tag
$t, continuation type $ct,
- an export
inner (called by the host via the embedder API),
- a continuation body
$callee which calls a host import $h_import,
then executes (suspend $t),
- an export
go which (resume $ct (on $t $h) (cont.new $ct (ref.func $callee))) and returns 1 if the suspend reaches the
handler, 0 if the resume completes "normally".
The host import $h_import is Func::wrap-defined. When
RECURSIVE=1, it looks up the inner export through the
Caller<'_, T> and calls it via the embedder API. That recursive
embedder→wasm call is what triggers the inner
EntryStoreContext::enter_wasm/exit_wasm pair that overwrites the
running continuation's cx.stack_chain.
Compile with:
Without recursion (h does nothing):
$ ./target/release/repro
[wasm trace] 10
[wasm trace] 1
[host] h called; recursive=false
[wasm trace] 2
[wasm trace] 12 ← suspend reached the handler
go returned: Ok(1)
OK: suspend correctly delivered to outer handler.
With recursion (h calls inner via embedder API):
$ RECURSIVE=1 ./target/release/repro
[wasm trace] 10
[wasm trace] 1
[host] h called; recursive=true
[wasm trace] 100 ← inner runs
[host] inner returned to host
[wasm trace] 2
[wasm trace] 11 ← resume took the "normal completion" path
go returned: Ok(0)
BUG OBSERVED: go returned Ok(0); …
Trace 12 is unreachable except via the suspend handler; trace 11
is unreachable except via "the resume completed without ever firing
$t". Adding nothing more than a recursive embedder call between the
two flips the outcome from one to the other — exactly the symptom
that follows from cx.stack_chain being clobbered to InitialStack
during inner's enter_wasm/exit_wasm pair, after which
search_handler's is_initial_stack short-circuit returns "no
match" and the (suspend $t) cannot find its outer handler.
(I verified in a temporarily-instrumented build that the inner
enter_wasm reads prev_chain = Continuation(...) and the
matching inner exit_wasm then writes
InitialStack(<inner csi pointer>) back into the live
VMStoreContext. That instrumentation is not part of the test
program shipped here.)
Suggested fix
Restore the pre-regression order and save the previous value of
stack_chain before writing the new one. The minimal patch:
unsafe {
let vm_store_context = store.0.vm_store_context();
let prev_stack_chain = mem::replace(
&mut *vm_store_context.stack_chain.get(),
VMStackChain::InitialStack(initial_stack_information),
);
Self {
stack_limit,
last_wasm_exit_pc: *(*vm_store_context).last_wasm_exit_pc.get(),
last_wasm_exit_trampoline_fp: *(*vm_store_context).last_wasm_exit_trampoline_fp.get(),
last_wasm_entry_fp: *(*vm_store_context).last_wasm_entry_fp.get(),
last_wasm_entry_sp: *(*vm_store_context).last_wasm_entry_sp.get(),
last_wasm_entry_trap_handler: *(*vm_store_context).last_wasm_entry_trap_handler.get(),
stack_chain: prev_stack_chain,
vm_store_context,
}
}
This mirrors the way stack_limit is already saved earlier in the
same function via mem::replace and matches the pre-regression
semantics.
Why the symptom is "wrong return value" rather than a SIGSEGV here
The dangling-pointer payload of the corrupted InitialStack(...) is
what's theoretically dangerous (cranelift resume/switch lower
to get_common_stack_information(...).load_state(...), which would
load from the freed stack region). In this test
search_handler's discriminant check sees InitialStack first and
short-circuits to the "tag not handled" path before any
dereference, so we observe the silent control-flow bug rather than a
crash. A reproducer that performs (switch $ct $t) instead of
(suspend $t) after the recursive call would dereference the
dangling pointer (unchecked_get_continuation interprets the
payload as a *mut VMContRef), but at that point we are firmly in
"a guest can crash the host" territory which is not yet a security
issue because stack switching is gated.
Other manifestations
Even with stack switching disabled, the same enter_wasm writes
InitialStack(<this activation's csi>) to stack_chain and saves
that same value as the "previous" one. After a non-stack-switching
nested host→wasm call (outer wasm → host import → inner wasm via the
embedder API), cx.stack_chain is left pointing into the freed
inner invoke_wasm_and_catch_traps stack frame, and the outer
wasm's EntryStoreContext has lost the original pre-entry value.
That dangling pointer is currently only consumed by the backtrace
machinery (vm/traphandlers/backtrace.rs:236) and the iterator
short-circuit for InitialStack happens to avoid loading from it,
so the misbehaviour is silent today — but the saved value is wrong
relative to the documented contract regardless.
For the async path, CallThreadState::swap puts the saved (=new)
value back into cx.stack_chain on every fiber suspension. If a
host then drops the suspended fiber without resuming it, the fiber
stack is freed and cx.stack_chain is left pointing into the freed
fiber stack until the next enter_wasm overwrites it. Same
description as above: silent today, broken contract, will start
crashing as soon as anything between fiber-drop and the next
enter_wasm reads cx.stack_chain non-trivially.
This input:
yields:
LLM-generated description, possibly wrong, is:
Details
# `EntryStoreContext::enter_wasm` saves the *new* `stack_chain`, breaking nested embedder→wasm calls during stack switchingScope:
crates/wasmtime/src/runtime/func.rs:1587-1606(
EntryStoreContext::enter_wasm),crates/wasmtime/src/runtime/func.rs:1614-1629(
EntryStoreContext::exit_wasm),crates/wasmtime/src/runtime/vm/traphandlers.rs:667-696(
CallThreadState::swap).Severity: Stack switching is currently 🚧 (work-in-progress) on
x86_64 Cranelift, so this is not a security issue per Wasmtime's tier
policy today. It is, however, a soundness bug that must be fixed
before stack switching can graduate to a stable feature: a guest that
does nothing more than call a normal host import — where the host
happens to call back into the same
Storevia the embedder API —silently invalidates the running continuation's
cx.stack_chain. Thesame bug also affects any future stack-switching-aware fiber-suspension
path through
CallThreadState::swap.Reproducible: Yes. Test program in this folder
(
enter-wasm-stack-chain-repro) exercises only public Wasmtime API(
Func::wrap,Caller::get_export,Func::typed,Func::call) plusthe gated
wasm_stack_switching(true)config bit. Running withRECURSIVE=1producesOk(0)instead of the expectedOk(1).Summary
EntryStoreContext::enter_wasmis responsible for saving thepre-entry values of various
VMStoreContextfields so thatexit_wasm(run on Drop) andCallThreadState::swap(run duringasync fiber suspension) can put the store back the way it was before
this wasm activation started. Five of the six fields it saves are
read without being written, so the saved value is the pre-entry
value as intended:
stack_chain, however, is written before being read:So
Self.stack_chainends up holding the newInitialStack(initial_stack_information)rather than the valuestack_chainhad on entry toenter_wasm. Both consumers ofSelf.stack_chain(drop-timeexit_wasm, async-fiberswap) writethat value back into the live
vm_store_context, so any pre-entryvalue (a
Continuation(...)from being inside aresumebody, orthe outer activation's
InitialStack(...)for a nested host→wasmcall) is unrecoverable.
This is a regression from commit
192f2fcdadfec9d0cf6b58548a85a7307450cbf5("Replace setjmp/longjmpusage in Wasmtime"). Before that commit
enter_wasmdid the readfirst, then the write:
The setjmp/longjmp removal commit collapsed the load+store into a
single expression but moved the store before the struct literal,
silently flipping the saved value from "previous" to "new".
What the contract is supposed to be
CallThreadState::swap(vm/traphandlers.rs:656-696) is the mostexplicit description:
"Restores a store to just before this activation was called" can only
hold if
Self.stack_chainis the valuevm_store_context.stack_chainhad before
enter_wasmran. The same is true ofexit_wasm,which mirror-images
swapfor the synchronous path.The reproducer
src/main.rshere uses only public Wasmtime API. The wasm modulecontains:
$t, continuation type$ct,inner(called by the host via the embedder API),$calleewhich calls a host import$h_import,then executes
(suspend $t),gowhich(resume $ct (on $t $h) (cont.new $ct (ref.func $callee)))and returns1if the suspend reaches thehandler,
0if the resume completes "normally".The host import
$h_importisFunc::wrap-defined. WhenRECURSIVE=1, it looks up theinnerexport through theCaller<'_, T>and calls it via the embedder API. That recursiveembedder→wasm call is what triggers the inner
EntryStoreContext::enter_wasm/exit_wasmpair that overwrites therunning continuation's
cx.stack_chain.Compile with:
Without recursion (
hdoes nothing):With recursion (
hcallsinnervia embedder API):Trace
12is unreachable except via the suspend handler; trace11is unreachable except via "the resume completed without ever firing
$t". Adding nothing more than a recursive embedder call between the
two flips the outcome from one to the other — exactly the symptom
that follows from
cx.stack_chainbeing clobbered toInitialStackduring
inner'senter_wasm/exit_wasmpair, after whichsearch_handler'sis_initial_stackshort-circuit returns "nomatch" and the
(suspend $t)cannot find its outer handler.(I verified in a temporarily-instrumented build that the inner
enter_wasmreadsprev_chain = Continuation(...)and thematching inner
exit_wasmthen writesInitialStack(<inner csi pointer>)back into the liveVMStoreContext. That instrumentation is not part of the testprogram shipped here.)
Suggested fix
Restore the pre-regression order and save the previous value of
stack_chainbefore writing the new one. The minimal patch:This mirrors the way
stack_limitis already saved earlier in thesame function via
mem::replaceand matches the pre-regressionsemantics.
Why the symptom is "wrong return value" rather than a SIGSEGV here
The dangling-pointer payload of the corrupted
InitialStack(...)iswhat's theoretically dangerous (cranelift
resume/switchlowerto
get_common_stack_information(...).load_state(...), which wouldload from the freed stack region). In this test
search_handler's discriminant check seesInitialStackfirst andshort-circuits to the "tag not handled" path before any
dereference, so we observe the silent control-flow bug rather than a
crash. A reproducer that performs
(switch $ct $t)instead of(suspend $t)after the recursive call would dereference thedangling pointer (
unchecked_get_continuationinterprets thepayload as a
*mut VMContRef), but at that point we are firmly in"a guest can crash the host" territory which is not yet a security
issue because stack switching is gated.
Other manifestations
Even with stack switching disabled, the same
enter_wasmwritesInitialStack(<this activation's csi>)tostack_chainand savesthat same value as the "previous" one. After a non-stack-switching
nested host→wasm call (outer wasm → host import → inner wasm via the
embedder API),
cx.stack_chainis left pointing into the freedinner
invoke_wasm_and_catch_trapsstack frame, and the outerwasm's
EntryStoreContexthas lost the original pre-entry value.That dangling pointer is currently only consumed by the backtrace
machinery (
vm/traphandlers/backtrace.rs:236) and the iteratorshort-circuit for
InitialStackhappens to avoid loading from it,so the misbehaviour is silent today — but the saved value is wrong
relative to the documented contract regardless.
For the async path,
CallThreadState::swapputs the saved (=new)value back into
cx.stack_chainon every fiber suspension. If ahost then drops the suspended fiber without resuming it, the fiber
stack is freed and
cx.stack_chainis left pointing into the freedfiber stack until the next
enter_wasmoverwrites it. Samedescription as above: silent today, broken contract, will start
crashing as soon as anything between fiber-drop and the next
enter_wasmreadscx.stack_chainnon-trivially.