Block interface: parameter and linear config, separate SSM config. #358

jlamypoirier · 2025-08-27T20:41:16Z

✨ Description

Not really complete by itself, but extracted as a separate PR to limit PR scope.

New(ish) concepts:

Parameters are created through ParameterConfig.get_parameter, and most layers (more to come) are created through [LayerConfig].get_layer. This ensures correct, standardized creation, and leave more room for new additions (ex. dynamic types.)
Parameter and linear (will) support fine-grained customization, but are typically set by the parent layer. Because of this, several parameters will be left "unset" by default (None or with special default marker), and get_parameter/get_layer take default values as arguments. This way we keep existing behaviour as default and make the new options truly optional and opt-in. (Otherwise, things like disabling biases, setting initialization scale or lr scale would have needed manual setting of every single parameter.)

Main changes:

Add ParameterConfig as the new standard way to configure and instantiate (get_parameter) every parameter. Currently a placeholder config, but standard parameters (lr scale, initialization, maybe more) will be added in next PRs.
Add OptionalParameterConfig for weights that may be enabled or disabled (ex. biases). It comes with an enabled option, with default provided by the parent layer.
Add (mostly empty) configuration for linear layers. Distinguish pure linear (no bias) from affine linear (optional bias) . Linear layers are created through get_layer, which takes non-config arguments as well as defaults for bias.enabled (default_add_bias) and initialization (customizable initialization will come later).
Add CausalConv1d layer (based on Mamba 2 and Discrete Mamba 2 implementations) and config. Config is similar to AffineLinearConfig, but also supports custom activation, with default set by the parent layer.
Rework all layers and their configs to use the new configs.
Make the SSM config dynamic, separating it into MambaConfig, Mamba2Config, DiscreteMamba2Config. Things are a bit awkward for now because of the couble configuration (hybrid_block_layout, ssm.type), but this will be addressed in upcoming PR.
Remove auto_grad_accumulation arguments, as things work without it, and removing it allows mixing auto and non-auto accumulation (dt bias).

Config/breaking changes:

SSM arguments removed for SSM types that don't use them. (Ex. setting Mamba 2 option d_xb in a Mamba 1 layer will cause a crash.)
[temporary hack ] SSM configs need an explicitly set type.
Remove ssm.add_bias_linear, AddLinearBiasChoices. add_linear_biases: bool is kept as the only global option for biases, at least for now. Other options may be achieved through individual layer configs.
ssm.expansion_factor removed (redundant)
ssm.conv_kernel_dimension -> ssm.convolution_layer.kernel_size
ssm.activation_type -> ssm.convolution_layer.activation
Renamed conv1d_weight -> convolution.weight
Renamed conv1d_bias -> convolution.bias
Option to enable or disable all supported linear and convolution biases (unchanged defaults)
Mamba:
- Added support for convolution bias
- Renamed dt_proj_weight -> dt_proj.weight
- Renamed dt_proj_bias -> dt_proj.bias
Mamba 2
- dt_proj_bias -> dt_proj.bias
- Add support for custom convolution activation, backup torch implementation

TODO:

Allow separate configuration for concatenated layers (ex. key_value, ssm in_proj)
Deal with conversion.
Review global bias options.

stuff

9741ba0

jlamypoirier changed the title ~~Block interface: parameter and linear config~~ Block interface: parameter and linear config, separate SSM config. Aug 27, 2025

jlamypoirier added 2 commits August 27, 2025 17:56

fixes

be69677

Simplify bias options

82a70aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Block interface: parameter and linear config, separate SSM config. #358

Block interface: parameter and linear config, separate SSM config. #358

jlamypoirier commented Aug 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Block interface: parameter and linear config, separate SSM config. #358

Are you sure you want to change the base?

Block interface: parameter and linear config, separate SSM config. #358

Conversation

jlamypoirier commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

Uh oh!

Uh oh!

jlamypoirier commented Aug 27, 2025 •

edited

Loading