Block interface: parameter and linear config, separate SSM config. #358
+730
−475
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
✨ Description
Not really complete by itself, but extracted as a separate PR to limit PR scope.
New(ish) concepts:
ParameterConfig.get_parameter
, and most layers (more to come) are created through[LayerConfig].get_layer
. This ensures correct, standardized creation, and leave more room for new additions (ex. dynamic types.)None
or with specialdefault
marker), andget_parameter
/get_layer
take default values as arguments. This way we keep existing behaviour as default and make the new options truly optional and opt-in. (Otherwise, things like disabling biases, setting initialization scale or lr scale would have needed manual setting of every single parameter.)Main changes:
ParameterConfig
as the new standard way to configure and instantiate (get_parameter
) every parameter. Currently a placeholder config, but standard parameters (lr scale, initialization, maybe more) will be added in next PRs.OptionalParameterConfig
for weights that may be enabled or disabled (ex. biases). It comes with anenabled
option, with default provided by the parent layer.get_layer
, which takes non-config arguments as well as defaults forbias.enabled
(default_add_bias
) and initialization (customizable initialization will come later).CausalConv1d
layer (based on Mamba 2 and Discrete Mamba 2 implementations) and config. Config is similar toAffineLinearConfig
, but also supports custom activation, with default set by the parent layer.MambaConfig
,Mamba2Config
,DiscreteMamba2Config
. Things are a bit awkward for now because of the couble configuration (hybrid_block_layout
,ssm.type
), but this will be addressed in upcoming PR.auto_grad_accumulation
arguments, as things work without it, and removing it allows mixing auto and non-auto accumulation (dt bias).Config/breaking changes:
d_xb
in a Mamba 1 layer will cause a crash.)type
.ssm.add_bias_linear
,AddLinearBiasChoices
.add_linear_biases: bool
is kept as the only global option for biases, at least for now. Other options may be achieved through individual layer configs.ssm.expansion_factor
removed (redundant)ssm.conv_kernel_dimension
->ssm.convolution_layer.kernel_size
ssm.activation_type
->ssm.convolution_layer.activation
conv1d_weight
->convolution.weight
conv1d_bias
->convolution.bias
dt_proj_weight
->dt_proj.weight
dt_proj_bias
->dt_proj.bias
dt_proj_bias
->dt_proj.bias
TODO: