Separate AstNode variants for expressions, statements, types #54

ysthakur · 2025-03-09T19:13:52Z

This PR adds three variants, AstNode::Expr, AstNode::Stmt, and AstNode::Type, and adds corresponding types for these variants to wrap around. I'm not sure if this would make the size of AstNode slightly bigger, but I think being able to separate these nodes into their own types is helpful. I also separated Variable into VarRef and VarDecl, since the meaning of each is different.

One advantage of this is that, if a method needs to check that the NodeId it received points to an expression, it can simply check if the node is an AstNode::Expr. We can also pass around Exprs themselves now.

I'm not sure if this is what you meant by using generics, @kubouch, but in the future, we may be able to make NodeId typed, i.e., create a NodeId<Expr> when an Expr node is added to the compiler, or a NodeId<AstNode> when something like AstNode::Name is added. Then we could have an accompanying trait looking like this:

trait ExtractNode { fn extract(node: AstNode) -> Option<Self>; }
impl ExtractNode for AstNode { fn extract(node: AstNode) -> Option<Self> { Some(node) } }
impl ExtractNode for Expr { // Similar impl for Stmt and Type too
  fn extract(node: AstNode) -> Option<Self> {
    match node { AstNode::Expr(e) => Some(e), _ => None }
  }
}

And then you could have a method in Compiler like

fn get_node_typed<T: ExtractNode>(&self, node_id: NodeId<T>) -> T {
  ExtractNode::<T>::extract(self.get_node(node_id)).expect("idk");
}

Apologies in advance for the massive diff. The changes to the snapshots are mostly just wrapping things in Stmt() or Expr()

kubouch · 2025-03-12T18:30:02Z

I'm not sure how much value we get by splitting expr, statement and type nodes into separate types. The AstNode enum is shorter but now, you need to type AstNode::Expr(<foo>) instead of just <foo> everywhere. Not sure how valuable the passing of expressions is because we generally pass around the NodeIds.

I've been thinking we could instead follow a more compact memory layout similar to https://alic.dev/blog/dense-enums. Currently, many AstNodes contain just one or few bytes of information (like operators) but the enum itself takes 48 bytes, so there is both a lot of wasted space and a potential for optimization. This one I'd do once we have the parser more complete.

VarRef and VarDecl separation looks good to me.

By generics, I meant making each AstNode variant a separate type, then instead of having methods like .lsquare() etc. copy-pasted all over the parser, we'd have just one .node<T>(), for example .node<LSquare>(). Just to shorten the code.

ysthakur · 2025-03-12T23:45:20Z

My main motivation is a function I'm making to do typechecking/type inference on expressions. It looks like this:

fn typecheck_expr(&mut self, expr: Expr, expected: TypeId) -> TypeId {
  // match on the different Expr variants
}

I could totally make it accept a NodeId instead and keep the current design of AstNode:

fn typecheck_expr(&mut self, expr: NodeId, expected: TypeId) -> TypeId {
  match self.compiler.get_node(expr) {
    AstNode::Int => ...
    ...more valid expression variants
    _ => panic!("expected expression")
  }
}

But it makes calling this function from typecheck_node more annoying. I don't want to have to do something like this:

fn typecheck_node(&mut self, node_id: NodeId) {
  match self.compiler.ast_nodes[node_id.0] {
    AstNode::Int
    | AstNode::List(_)
    | AstNode::Closure { ... }
    | ...every single expression variant => self.typecheck_expr(node_id, UNKNOWN_TYPE),
    // other cases
  }
}

An alternative would be to first match on every single variant that isn't an expression, then leave the default case for expressions. But that makes me uncomfortable, because we might easily miss a variant:

fn typecheck_node(&mut self, node_id: NodeId) {
  match self.compiler.ast_nodes[node_id.0] {
    AstNode::Let { .. } => ...
    AstNode::Name => ...
    AstNode::Def { ... } => ...
    ...everything but the expression variants
    _ => self.typecheck_expr(node_id, UNKNOWN_TYPE),
  }
}

So I pulled Expr out to make my life easier. It's not as big a problem with Stmt and Type, but I pulled them out while I was at it in case we run into the same problem with them in the future too.

You're right about the parser wasting memory for smaller variants. Are you thinking about just pulling out operators into a separate vector, or also grouping other variants of similar sizes together?

Side note: Bumpalo would've been really nice for avoiding the padding without us having to think too hard about variant sizes and stuff, but it requires giving up the ability to iterate through a list of nodes, not rolling back so we can parse closures, and using Boxes rather than NodeIds.

By generics, I meant making each AstNode variant a separate type, then instead of having methods like .lsquare() etc. copy-pasted all over the parser, we'd have just one .node(), for example .node(). Just to shorten the code.

Ah, I think I understand now.

kubouch · 2025-03-13T21:02:36Z

Could you do this using a helper? For example:

fn typecheck_node(&mut self, node_id: NodeId) {
  if self.compiler.ast_nodes[node_id.0].is_expr()
    self.typecheck_expr(node_id, UNKNOWN_TYPE),
  } else {
    // other cases
  }
}

But if you think it's more helpful to have them separate, we can do that. I checked and the AstNode size remains 48 bytes so there shouldn't be a performance difference.

Are you thinking about just pulling out operators into a separate vector, or also grouping other variants of similar sizes together?

I'm not sure yet, but for example an X-bit tag and Y-bit index could be packed into one Z-bit integer for each node. The tag identifies the node enum variant, index would point at another vector with the node data (if relevant, operators wouldn't point anywhere). X depends on the number of variants, Z is either 32 or 64, which then gives Y.

I'm not sure how bumpalo fits here. We don't really do many allocations, we have just a Vec of nodes (with some additional vecs of node data, like blocks) that can be pre-allocated in advance to some sensible size. Otherwise, we shouldn't really be allocating anything at runtime, unless we overflow some of the vector capacity. But maybe I'm missing something, I haven't checked the crate closely.

Ah, I think I understand now.

The is_xxx() methods could also be modified to is<AstNode::Xxx>() the same way. Some nodes have custom checking, but most of the simple ones should be possible to do like that.

ysthakur · 2025-03-20T01:06:28Z

Sorry, forgot to reply to this.

Could you do this using a helper? For example:

Yeah, that does work, thanks. I can use that for now, and if it ends up not working out, we can always come back to this.

I'm not sure how bumpalo fits here. We don't really do many allocations, we have just a Vec of nodes (with some additional vecs of node data, like blocks) that can be pre-allocated in advance to some sensible size. Otherwise, we shouldn't really be allocating anything at runtime, unless we overflow some of the vector capacity. But maybe I'm missing something, I haven't checked the crate closely.

There's a with_capacity function you can use to create an arena of a specified size to avoid allocating at the start, not sure if that's what you meant

ysthakur added 10 commits February 22, 2025 23:22

return_ty -> in_out_types

84d3c82

Use term type args rather than params

596795f

Format

0014d90

Merge branch 'return_ty-to-in_out_types' into infer

31652fe

Separate Expr and Def types

bd5372a

Separate Stmt type

712d383

Merge branch 'main' into astnode-variants

f60883c

Don't panic on unsupported expressions

108d45f

Don't make separate Def struct

7108425

Remove AstNode::Type from typecheck_node

3fccf20

ysthakur requested a review from kubouch March 9, 2025 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Separate AstNode variants for expressions, statements, types #54

Separate AstNode variants for expressions, statements, types #54

Uh oh!

ysthakur commented Mar 9, 2025 •

edited

Loading

Uh oh!

kubouch commented Mar 12, 2025 •

edited

Loading

Uh oh!

ysthakur commented Mar 12, 2025

Uh oh!

kubouch commented Mar 13, 2025

Uh oh!

ysthakur commented Mar 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Separate AstNode variants for expressions, statements, types #54

Are you sure you want to change the base?

Separate AstNode variants for expressions, statements, types #54

Uh oh!

Conversation

ysthakur commented Mar 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kubouch commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ysthakur commented Mar 12, 2025

Uh oh!

kubouch commented Mar 13, 2025

Uh oh!

ysthakur commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ysthakur commented Mar 9, 2025 •

edited

Loading

kubouch commented Mar 12, 2025 •

edited

Loading

ysthakur commented Mar 20, 2025 •

edited

Loading