Question about unused instance variable in CausalAttention class #637
Closed
cihankarabulut
started this conversation in
General
Replies: 1 comment 1 reply
-
That's funny because I remember seeing this one when reading the book but never PR for consistency with the notebook. I can't talk for Sebastian but pretty sure, it's your point 2 and 3. You'll notice that for MHA, |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In the CausalAttention class implementation on pg. 81, I noticed that self.d_out = d_out is assigned in the init method, but this instance variable doesn't seem to be used anywhere else in the class. The output dimension is already passed directly to the linear layers (nn.Linear(d_in, d_out, bias=qkv_bias)), so I'm curious:
1.Is there a specific reason for storing d_out as an instance variable?
2.Is this intended for future extensions of the class?
3. Or is this simply a remnant from a previous version that could be safely removed?
I'm trying to understand best practices for transformer implementations and when it makes sense to store parameters as instance variables versus just using them directly.
Beta Was this translation helpful? Give feedback.
All reactions