Skip to content

Comments

Implementing the exception handling proposal in Wasmtime#36

Merged
fitzgen merged 7 commits intobytecodealliance:mainfrom
dhil:exn-rfc
May 14, 2025
Merged

Implementing the exception handling proposal in Wasmtime#36
fitzgen merged 7 commits intobytecodealliance:mainfrom
dhil:exn-rfc

Conversation

@dhil
Copy link
Contributor

@dhil dhil commented Aug 30, 2024

This RFC proposes to implement the exception handling proposal in Wasmtime. At the time of writing, exception handling is a phase 4 proposal.

I think there are lots of details worth discussing about possible designs and strategies on how to realise them. I am hoping that this document can be used to seed those discussions here.

I would like to give credit to @fitzgen for guidance on how to put this RFC together as well as helping with developing the ideas, thanks!

Rendered.

@fitzgen
Copy link
Member

fitzgen commented Aug 31, 2024

FYI, there are some discussions around how to support exception-throwing calls in the register allocator over in bytecodealliance/regalloc2#186 and some of that seems relevant for anyone interested in this RFC.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing all this up! I'll cc @bjorn3 as well here since they've done work in this area with rustc_codegen_cranelift and likely have thoughts as well.

One thing which might also be worth noting in this RFC is that we probably can't do away with the longjmp/setjmp that Wasmtime uses today to implement traps. Notably that enables recovery from a signal handler and additionally enables O(1) recovery in "deep" situations like stack overflow. I was hoping we could use exceptions to implement that as well but I'm less sure of that now.

@cfallin
Copy link
Member

cfallin commented Sep 4, 2024

Thanks for writing this up! I have a few thoughts mostly on the design of the CLIF functionality to encode exceptional control flow; I see a few others have commented on the points about special (catch) blocks, and there are some interesting design questions around how try_call applies block-param arguments as well that intersect with this.

To set a baseline first, I'll suggest the basic principle of: IR design should be as orthogonal as possible, i.e., features compose and special-cases or unsupported corners that require special handling are minimized.

As a corollary of that, if existing analysis and transform passes can work without having to be modified to be aware of exceptions, all the better. (This is the end-game of "put exceptional edges into the ordinary CFG", IMHO, with try_call as a "normal-ish" branch instruction.)

CLIF Design Questions

So I see at least three design questions here:

  • Are catch blocks their own kind of block/entity, or are they ordinary CFG blocks?

  • Can catch blocks (and the non-throwing "success" edge from try_call as well) take ordinary block parameters, or are they restricted not to do so?

  • How are the normal return value(s), and exceptional value(s), that result from try_call defined?

Q1: Distinct Kind of Catch-Blocks

I'd like to propose that the first question be resolved early to: catch-targets are ordinary CFG blocks (as also noted elsewhere in this PR). My reasoning is straightforward: a new kind of block, with its own restrictions, adds cognitive overhead and correctness questions to every analysis and pass in the compiler, increasing likelihood of bugs. Furthermore, it's not clear that there are reasons that require catch-targets to be distinct. We will want to codegen them as we do other basic blocks; an unwind that moves control to the handler address is just like an ordinary jump from the predecessor block. (If I'm missing some reason why they msut be distinct, please let me know!)

Q2: Blockparams

The next question is whether we allow blockparams on blocks that are targets of try_call (normal or exceptional edge(s)). If catch-blocks were distinct, it might be tempting to limit them in this way. (Perhaps this was part of the implicit thinking in your proposal?) However, having a kind of block entity in the IR that cannot take blockparams is a severe "missing corner" in the backend, and imposes restrictions all the way up the compiler pipeline. No blockparam should be necessary on a block with a single successor, so if we have distinct catch-blocks, it is technically possible; however it means that we have to have "perfect blockparam discipline" throughout the compiler passes, never adding unnecessary blockparams. We have historically had performance issues with unnecessary blockparams where algorithms are too approximate, and we have a constant-phis pass as a result. Note that while this is annoying as a performance issue, it is critical that it is allowed. Doing so is good compiler design, IMHO: it allows factoring of concerns, where we can have simpler transforms ("always add a blockparam for X", possibly a placeholder, etc) and then normalizations/canonicalizations later.

A few concrete examples of passes that work by adding blockparams, and would be difficult to write if we had an "exception catch blocks are a special case" rule:

  • "maximal SSA" transforms and SSA-cuts for stitching together subgraphs of the CFG (e.g., in weval)
  • CFG reducibility transforms, if we ever add a Wasm backend or need this for region-based analysis
  • tail-merging/deduplication, when it finds multiple copies of the same code for different exception handlers and merges them (the blockparams are even "real" here in the sense that they merge different values)

One objection might be that exceptional control flow could interact poorly with edge-moves, i.e., the moves that the register allocator has to insert to actually put blockparams in place, since we don't codegen the branch, it just "happens" via the external unwinder. However a catch-block reached from a try_call is reached via a multi-successor branch; as a multi-successor branch, try_call forces edge-moves into the heads of its successor blocks, so there are no moves that are "skipped" when an exception is caught.

Q3: try_call Result Values

try_call defines two sets of values, one or the other alternately (never both): the "normal" return value(s) of the called function, or exceptional state in terms of one or more value(s).

The former, normal results, are available only in the "fallthrough" block and blocks it dominates; exceptional state is available only in the "catch" block and the blocks it dominates. Conceptually, one can think of the values as being defined on the edges, or maybe implicitly in the headers of the appropriate successor blocks.

This is most like the semantics of existing blockparams, so blockparams are the natural way to write these definitions, as this RFC does.

There are two "bits of weirdness" that are worth addressing, however:

  • Though the values are at the IR level defined in successor blocks, they are at the instruction level defined starting at the call instruction, and this matters very much for the register allocator. We have some discussion of this in Support branch instructions that define their blockparams regalloc2#186. It's mostly a minor implementation detail from an IR design perspective, but it's worth noting that we need to present the defs to the regalloc at the try_call itself. This way the interference with, and participation in, edge-moves is computed appropriately.

  • The blockparams are in addition to the regular blockparams, noted above. (Perhaps this was also an implicit reason for your proposal avoiding blockparams on catch-blocks?) There are a few ways to syntactically separate the regular blockparams from try_call-produced ones, if we really want, but I'm not sure if I like them -- e.g. should we write

      fn0 = (i32, i32, i32) -> i64, i64
    block1(v0: i32, v1: i32, v2: i32):
      try_call fn0(v0, v1, v2), block2(v0, v0, results), block3(v1, exceptions)
    
    ;; v3, v4 are normal blockparams; v5, v6 are returns from fn0
    block2(v3: i32, v4: i32, v5: i64, v6: i64):
      ...
    
    ;; v7 is a normal blockparam; v8 is exception state
    block3(v7: i32, v8: i32):
    

    or omit the results / exceptions keywords and implicitly append?

    (I'll note by comparison that LLVM -- which uses phis rather than blockparams -- defines the exceptional state in the catch block with a landingpad instruction, separate from a phi instruction, so it makes a distinction in kind between the two. I'm not sure this is worth emulating though.)


Sorry for the lengthy reply here -- I suppose I consider the IR design aspect one of the most important parts here, as it has long-term implications on compiler complexity and ease of implementation, and the RFC presents one design point without going into too much depth on the why, so -- here are some starting points :-)

@fitzgen
Copy link
Member

fitzgen commented Sep 9, 2024

To set a baseline first, I'll suggest the basic principle of: IR design should be as orthogonal as possible, i.e., features compose and special-cases or unsupported corners that require special handling are minimized.

As a corollary of that, if existing analysis and transform passes can work without having to be modified to be aware of exceptions, all the better. (This is the end-game of "put exceptional edges into the ordinary CFG", IMHO, with try_call as a "normal-ish" branch instruction.)

💯

I'd like to propose that the first question be resolved early to: catch-targets are ordinary CFG blocks (as also noted elsewhere in this PR). My reasoning is straightforward: a new kind of block, with its own restrictions, adds cognitive overhead and correctness questions to every analysis and pass in the compiler, increasing likelihood of bugs. Furthermore, it's not clear that there are reasons that require catch-targets to be distinct. We will want to codegen them as we do other basic blocks; an unwind that moves control to the handler address is just like an ordinary jump from the predecessor block. (If I'm missing some reason why they msut be distinct, please let me know!)

I had been assuming that we couldn't treat landing pads as normal blocks, and instead like alternative function entry points. But I think that was a mistaken assumption that I never questioned. I might have been assuming that we wouldn't allow the reuse of any values in the landing pad, as a simplification? But that seems overly extreme now. Anyways, if a block is used as both a landing pad and a regular control-flow successor, then the regalloc constraints of the try_call edge will need to force all live values onto the stack and then reload them in the landing pad (or an auto-inserted critical edge block just before the "landing pad"?) and we don't actually have to do anything differently from normal blocks? And the unwinder would need to be responsible for restoring callee-save registers as well.

Anyways, if the catch blocks are indeed an artificial constraint, then definitely we should not artificially distinguish them from regular blocks in the IR.

The blockparams are in addition to the regular blockparams, noted above. (Perhaps this was also an implicit reason for your proposal avoiding blockparams on catch-blocks?) There are a few ways to syntactically separate the regular blockparams from try_call-produced ones, if we really want, but I'm not sure if I like them -- e.g. should we write

  fn0 = (i32, i32, i32) -> i64, i64
block1(v0: i32, v1: i32, v2: i32):
  try_call fn0(v0, v1, v2), block2(v0, v0, results), block3(v1, exceptions)
      
;; v3, v4 are normal blockparams; v5, v6 are returns from fn0
block2(v3: i32, v4: i32, v5: i64, v6: i64):
  ...
      
;; v7 is a normal blockparam; v8 is exception state
block3(v7: i32, v8: i32):

or omit the results / exceptions keywords and implicitly append?

I would say that if we allow splicing the payloads into the block parameters at arbitrary positions, eg

try_call fn0(v0, v1, v2), block2(v0, results, v1), block3(exceptions, v2)

then we should/must keep the keywords. If we always append or prepend, then implicit seems fine by me.

The generality of unconstrained splicing seems nice from a user perspective, but also maybe like something we won't actually need. I think if we can't think of a we-will-definitely-need-it-for-X use case, we should just implicitly append or prepend.

I consider the IR design aspect one of the most important parts here, as it has long-term implications on compiler complexity and ease of implementation

💯

@cfallin
Copy link
Member

cfallin commented Sep 9, 2024

I had been assuming that we couldn't treat landing pads as normal blocks, and instead like alternative function entry points. But I think that was a mistaken assumption that I never questioned. I might have been assuming that we wouldn't allow the reuse of any values in the landing pad, as a simplification? But that seems overly extreme now. Anyways, if a block is used as both a landing pad and a regular control-flow successor, then the regalloc constraints of the try_call edge will need to force all live values onto the stack and then reload them in the landing pad (or an auto-inserted critical edge block just before the "landing pad"?) and we don't actually have to do anything differently from normal blocks? And the unwinder would need to be responsible for restoring callee-save registers as well.

Anyways, if the catch blocks are indeed an artificial constraint, then definitely we should not artificially distinguish them from regular blocks in the IR.

I'm relatively convinced at least right now that "normal jump" is the best way to think of (and design to ensure) the unwinder -- we'll indeed want to restore callee-saves as we unwind more nested frames for this to be the case. In other words, my mental model for an exceptional return is "just like a normal return except PC/RIP is over here", plus register(s) set with exceptional state, just as register(s) are set with return values on normal returns. (Someone please correct me if this is wrong!) An alternative world where catch blocks are separate function entries seems much more "wild" to me in the sense that it breaks assumptions in lowering and regalloc.

One slightly in-the-weeds but relevant detail here is that as we compute the lowering block order from CLIF, and generate VCode blocks, if a catch-block is also a target of a normal branch, we'll end up splitting the critical edge: the exceptional edge comes from a block with multiple successors, and goes to a block with multiple predecessors. This will require a little bit of care to get right when we generate the unwind tables (usually the main block is associated with the CLIF-level label but here we want the block with edge-moves).

(EDIT: note this is also the case if we keep the catch-blocks as a separate kind of block, unreachable from normal branches, because two or more try_calls could share the same catch-blocks; so, orthogonal to this question per-se, rather I'm braindumping a thing I think we'll want to be careful about)

I would say that if we allow splicing the payloads into the block parameters at arbitrary positions, eg

try_call fn0(v0, v1, v2), block2(v0, results, v1), block3(exceptions, v2)

then we should/must keep the keywords. If we always append or prepend, then implicit seems fine by me.

The generality of unconstrained splicing seems nice from a user perspective, but also maybe like something we won't actually need. I think if we can't think of a we-will-definitely-need-it-for-X use case, we should just implicitly append or prepend.

Yeah, I'd tend to agree with the YAGNI principle here -- and between append and prepend, if no other reasons to lean either way, I might suggest that we append, because then we preserve the property that the block-call args on the targets line up with blockparams, and the new thing (try_call results) can take the extra logic to add the length of normal block args (or subtract result count from total blockparam count, whichever).

@bjorn3
Copy link
Contributor

bjorn3 commented Sep 10, 2024

In other words, my mental model for an exceptional return is "just like a normal return except PC/RIP is over here", plus register(s) set with exceptional state, just as register(s) are set with return values on normal returns.

Exactly!

Yeah, I'd tend to agree with the YAGNI principle here -- and between append and prepend, if no other reasons to lean either way, I might suggest that we append, because then we preserve the property that the block-call args on the targets line up with blockparams, and the new thing (try_call results) can take the extra logic to add the length of normal block args (or subtract result count from total blockparam count, whichever).

I believe I used prepend in my current implementation as cranelift-frontend appends the extra blockparams necessary for handling variables in SSA form to the user defined blockparams, so prepending the args for invoke/try_call makes cranelift-frontend handle it correctly without needing any changes.

@alexcrichton
Copy link
Member

Chris, Nick, and I talked about this in-person at the CG meeting over dinner yesterday so I wanted to drop some notes here from what we talked about so we don't forget:

  • We probably don't actually want to do the "set flags on exception" ABI. That means it's (a) a new ABI to maintain and (b) we can't ever turn it on by default. In some sense "wasted work" but also it's a lot more to maintain over time.
  • To assist with unwinding we say that an "invoke" will always spill all registers (everything is caller-saved) initially. That way we don't have to recover registers during unwinding.
  • We won't remove setjmp/longjmp yet, so that'll stay for traps (although it would be good to do it with instructions in Cranelift instead one day)
  • As a first stepping stone it might be interesting to add a mode where we just trap on throws. Non-exceptional programs could then be run/benchmarked which might help some early users dogfood.

@fitzgen
Copy link
Member

fitzgen commented Feb 26, 2025

@dhil is it okay if I take over this RFC to get it across the finish line?

@dhil
Copy link
Contributor Author

dhil commented Feb 26, 2025

@dhil is it okay if I take over this RFC to get it across the finish line?

Yes absolutely!

@fitzgen
Copy link
Member

fitzgen commented Mar 4, 2025

Just pushed a pretty major overhaul to this RFC, incorporating changes from this discussion and others:

  • try_call now have multiple exception paths, like br_table
  • no more calling convention stuff, just side tables
  • and more

Please take another look!

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on @fitzgen, it's looking very good to me 👍

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together; the details all basically look good to me! Just one uncertainty/question below.

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more thoughts on another read-through. Two things to clarify / add more detail on; no issues with the approach at all.

fitzgen added 2 commits March 6, 2025 14:35
* No two-phase exceptions
* Unwinding regular `call`s
* Why we do not have `try_call_indirect`
* Why no `throw` or `try` instruction
@fitzgen
Copy link
Member

fitzgen commented Mar 11, 2025

As we seem to have reached pretty broad consensus, and feedback has moved more into the realm of typo fixes and minor clarifications, I'd like to propose that we merge this RFC!

Motion to Finalize

Disposition: Merge


As always, details on the RFC process can be found here: https://github.com/bytecodealliance/rfcs/blob/main/accepted/rfc-process.md#making-a-decision-merge-or-close

@alexcrichton
Copy link
Member

I'd second the motion to merge 👍

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

At the risk of going to pun-jail, I have to note that I find this RFC... exceptional. Seconding!

@fitzgen
Copy link
Member

fitzgen commented Mar 11, 2025

As there has been signoff from representatives of two different BA stakeholder organizations, this RFC is now entering its 10-day

Final Comment Period

and the last day to raise concerns before this RFC merges is 2025-03-21.

Thanks everyone!

@cfallin
Copy link
Member

cfallin commented Mar 19, 2025

At the risk of slightly re-opening Pandora's box, I have a... final comment... during the Final Comment Period.

I'm ~1.5 days into implementing the Cranelift side of exception handling right now, and I have been realizing in a fairly deep way how the block-parameter strategy we have in this RFC may be suboptimal. In brief, the problem is that breaking the invariant that the block-call arguments in the target match up with the parameters on that block is leading to a lot of refactoring and gross hacks throughout the compiler. I don't doubt that I could continue to bulldoze through, but I'm worried that this is a strong signal about design coherence that will lead to future pain and bugs as we work with the IR, so I took a step back to think a bit (and talk with @fitzgen offline earlier today).

We originally decided above that we would write try_call fn0(args...), block1(args1), [ tag1: block2(args2), tag2: block3(args3), ... ] to mean that after return, control reaches one of block1 (no exception thrown), block2 or block3 (or unwinds further up the stack) with the following block parameters: (i) returns of fn0 concatenated with args1 for block1; (ii) a single pointer-width exception payload argument and args2 for block2, and (iii) likewise for block3. In other words, the BlockCalls have N, M and P args, but the blocks themselves have N + len(fn0.rets), M + 1, and P + 1 params. This doesn't happen anywhere else in the IR, and leads to a lot of awkwardness in the parser, the verifier, the phi-removal pass, lowering, and regalloc's view of blockparams (so far). We anticipated that this aspect might be a little weird but it seemed tenable at a high level.

We took this approach because we wanted to achieve a few goals: successor blocks should not be a special category or specially restricted; the IR should encode the invariant that function returns are only valid on normal return, and an exception payload is only valid on catching an exception; and we shouldn't do anything weird with SSA, like define return values and immediately use them in the try_call.

I believe I have a better solution now though that still preserves these properties, and avoids the higher-than-anticipated impedance mismatches. Namely:

  • We don't actually need an exception payload parameter in the IR, we think. In Wasmtime, we always have vmctx, and we can convey exceptional state through a field there (or perhaps in the Store since unwinding can cross instances).
  • Return values can be defined by the try_call itself. They must already be defined at this point from a regalloc point of view, as a sort of "early def", because the register state will be filled in before control returns to the caller and transfers across the edge to the normal-return target.

The latter point means that the normal-return values are defined even on exceptional returns. This may seem like a semantic lie. I have what I think is a reasonable answer that technically avoids adding UB to the IR: we could provide in the unwind metadata information about where return values go (registers/stack locations), and specify that it is up to the embedder to either fill those values in, or guarantee that it does not use the values on exceptional paths. (In effect, the values are always "returned", it's just that the unwinder decides the return values on unwinding; and Wasmtime is free to decide that the current value in rax at time of throw is the return value, because it generated IR that doesn't care, or it could zero them as a defense-in-depth thing.)

Anyway, I'd be curious what others think, and I'm happy to fill in more about the pain-points discovered.

(Slightly tangential but worth saying, IMHO: I personally think we should always at least begin implementation or prototype before fully merging an RFC, and this is why -- there are some things we just can't know fully until we work through all the details!)

@bjorn3
Copy link
Contributor

bjorn3 commented Mar 19, 2025

We don't actually need an exception payload parameter in the IR, we think

cg_clif absolutely needs it.

@cfallin
Copy link
Member

cfallin commented Mar 19, 2025

Hmm, that is good to know. And there isn't a way to tunnel the state through, e.g., a TLS slot? I suppose the issue here is that you don't control the stdlib so we basically need the same capability as LLVM for the native-code case...


I think one way to express what I'm finding in implementation is that there has to be something weird and different about exceptional edges, because they do introduce this payload. That payload has to come out of thin air somehow. The ways we have are:

  • No it doesn't -- no payload here! (the new suggestion)
  • Extra magical block args that are not in the block call, making the block calls special and different and requiring contortions when viewing those edges
  • Extra magical operators at the start of the target blocks (i.e., landingpad instructions), making the blocks special and different and requiring contortions when editing them (and making sharing them harder)

I suppose the last option -- one I don't like, but just to name it -- is to say that both the normal returns and exceptional returns arise as normal defs, so the instruction results for a try_call are the concatenation of calls and exception payload. (If you squint, this is Go's result, err := f(...)...) But this means that we have extra magical logic when squaring function signatures with instruction results (which will come up in, e.g., ABI code).

So we either punt on the problem, or we have awkward logic on edges, or awkward logic on blocks, or awkward logic on calls. The native-code case apparently means we can't punt on the problem.

I'm not sure what to make of this -- we can discuss more in the Cranelift meeting tomorrow I suppose.

@bjorn3
Copy link
Contributor

bjorn3 commented Mar 19, 2025

And there isn't a way to tunnel the state through, e.g., a TLS slot? I suppose the issue here is that you don't control the stdlib so we basically need the same capability as LLVM for the native-code case...

Yeah, rust_eh_personality only passes it as landingpad argument.

Did you see how I did the landingpad arguments in my branch by the way. While it adds some complexity, it didn't seem like it added all that much complexity.

@cfallin
Copy link
Member

cfallin commented Mar 19, 2025

We discussed this in the Cranelift meeting this morning and came to a relatively reasonable design point, which I'm actually fairly happy about: we decided to reify the "arising in the middle of an edge" values, the return values and exception payloads, as a new kind of "placeholder argument" in the block calls. There are some sketches of this above but basically it looks like: try_call fn0(...), block1(ret0, ret1, v1, v2), [ tag1: block2(exn0, v3), ... ]. In other words, we allow retN and exnN values in block calls on a try_call[_indirect] instruction.

Importantly, the args themselves are a sum-type of "normal value" or "placeholder" (try-call return, try-call exception payload, other in the future?). We don't define a new kind of value; these args are not values.

This is nice for a few reasons:

  • It avoids the problem of mismatching block-call argument list and block parameter list, which started this whole discussion and would be a recurring footgun/wart in the design otherwise.
  • It requires only local validation; we can know if the block calls are correct (referring to valid return values and exception payloads) by looking only at the function signature of the called function.
  • It feels closest to the reality of where the values are defined: not by the try_call for all dominated instructions in both paths; not in successors (with landingpads or equivalent); but actually on the edge. Code that processes block-call arguments will need to handle this case, but it is front-and-center (and exposable as a new type) rather than an implicit thing.
  • It uses syntactic restrictions rather than cross-instruction or cross-block invariants to ensure well-formedness of the construct.

I'll continue building this and will say more if there are other unanticipated issues, but I don't expect there to be at the moment!

Rather than concatenating or appending them onto the block calls. Chris's
prototyping revealed that this is a better approach for our existing code base.
@fitzgen
Copy link
Member

fitzgen commented Apr 24, 2025

I've updated the RFC based on the final comments and the CLIF design tweaks Chris made during his prototyping.

Given that the prototyping has merged into Cranelift main and that cg-clif is passing all of Rust's unwinding tests in a branch based on that prototyping, it seems like we are in a great place here and I'm going to go ahead and start a new final comment period on this RFC.


Final Comment Period

Disposition: merge.

The last day to raise concerns before this RFC merges is 2025-05-03.

Thanks again, everyone!

@fitzgen
Copy link
Member

fitzgen commented May 14, 2025

Whoops, forgot to merge this on the third, so it got some extra FCP time.

Thanks @dhil for writing the first draft of this RFC and to everyone who provided feedback, brainstormed, and helped improve it!

@fitzgen fitzgen merged commit 3ad1a42 into bytecodealliance:main May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants