-
Notifications
You must be signed in to change notification settings - Fork 8
Separate AstNode variants for expressions, statements, types #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I'm not sure how much value we get by splitting expr, statement and type nodes into separate types. The AstNode enum is shorter but now, you need to type I've been thinking we could instead follow a more compact memory layout similar to https://alic.dev/blog/dense-enums. Currently, many AstNodes contain just one or few bytes of information (like operators) but the enum itself takes 48 bytes, so there is both a lot of wasted space and a potential for optimization. This one I'd do once we have the parser more complete. VarRef and VarDecl separation looks good to me. By generics, I meant making each AstNode variant a separate type, then instead of having methods like |
My main motivation is a function I'm making to do typechecking/type inference on expressions. It looks like this: fn typecheck_expr(&mut self, expr: Expr, expected: TypeId) -> TypeId {
// match on the different Expr variants
} I could totally make it accept a fn typecheck_expr(&mut self, expr: NodeId, expected: TypeId) -> TypeId {
match self.compiler.get_node(expr) {
AstNode::Int => ...
...more valid expression variants
_ => panic!("expected expression")
}
} But it makes calling this function from fn typecheck_node(&mut self, node_id: NodeId) {
match self.compiler.ast_nodes[node_id.0] {
AstNode::Int
| AstNode::List(_)
| AstNode::Closure { ... }
| ...every single expression variant => self.typecheck_expr(node_id, UNKNOWN_TYPE),
// other cases
}
} An alternative would be to first match on every single variant that isn't an expression, then leave the default case for expressions. But that makes me uncomfortable, because we might easily miss a variant: fn typecheck_node(&mut self, node_id: NodeId) {
match self.compiler.ast_nodes[node_id.0] {
AstNode::Let { .. } => ...
AstNode::Name => ...
AstNode::Def { ... } => ...
...everything but the expression variants
_ => self.typecheck_expr(node_id, UNKNOWN_TYPE),
}
} So I pulled You're right about the parser wasting memory for smaller variants. Are you thinking about just pulling out operators into a separate vector, or also grouping other variants of similar sizes together? Side note: Bumpalo would've been really nice for avoiding the padding without us having to think too hard about variant sizes and stuff, but it requires giving up the ability to iterate through a list of nodes, not rolling back so we can parse closures, and using
Ah, I think I understand now. |
Could you do this using a helper? For example: fn typecheck_node(&mut self, node_id: NodeId) {
if self.compiler.ast_nodes[node_id.0].is_expr()
self.typecheck_expr(node_id, UNKNOWN_TYPE),
} else {
// other cases
}
} But if you think it's more helpful to have them separate, we can do that. I checked and the AstNode size remains 48 bytes so there shouldn't be a performance difference.
I'm not sure yet, but for example an X-bit tag and Y-bit index could be packed into one Z-bit integer for each node. The tag identifies the node enum variant, index would point at another vector with the node data (if relevant, operators wouldn't point anywhere). X depends on the number of variants, Z is either 32 or 64, which then gives Y. I'm not sure how bumpalo fits here. We don't really do many allocations, we have just a Vec of nodes (with some additional vecs of node data, like blocks) that can be pre-allocated in advance to some sensible size. Otherwise, we shouldn't really be allocating anything at runtime, unless we overflow some of the vector capacity. But maybe I'm missing something, I haven't checked the crate closely.
The |
Sorry, forgot to reply to this.
Yeah, that does work, thanks. I can use that for now, and if it ends up not working out, we can always come back to this.
There's a |
This PR adds three variants,
AstNode::Expr
,AstNode::Stmt
, andAstNode::Type
, and adds corresponding types for these variants to wrap around. I'm not sure if this would make the size ofAstNode
slightly bigger, but I think being able to separate these nodes into their own types is helpful. I also separatedVariable
intoVarRef
andVarDecl
, since the meaning of each is different.One advantage of this is that, if a method needs to check that the
NodeId
it received points to an expression, it can simply check if the node is anAstNode::Expr
. We can also pass aroundExpr
s themselves now.I'm not sure if this is what you meant by using generics, @kubouch, but in the future, we may be able to make
NodeId
typed, i.e., create aNodeId<Expr>
when anExpr
node is added to the compiler, or aNodeId<AstNode>
when something likeAstNode::Name
is added. Then we could have an accompanying trait looking like this:And then you could have a method in
Compiler
likeApologies in advance for the massive diff. The changes to the snapshots are mostly just wrapping things in
Stmt()
orExpr()