OSS LLM Tools for Conversion, Evaluation, Numerical Debugging and Benchmarking
Find first bad commit that dropped the accuracy of a model.
cd <target repo>
python ../oss-llm-tools/bisect_accuracy.py --good <Good Commit> --bad <Bad Commit> --model google/gemma-3-12b-it --task gsm8k --target 0.4 --limit 100 --bisect_log_file --model_args '{"tensor_parallel_size": 4}' --eval_args '{"num_fewshot":5}' --stop_with_exception --bisect_log /tmp/bisect.log
Create a dummy transformer model with optional custom weights. This script combines model initialization from a config file or Hugging Face model with the ability to add or update specific tensors. This is useful for:
- Testing code that requires a model with a specific architecture without needing the actual trained weights
- Creating smaller test models by overriding parameters like number of layers
- Generating placeholder models with specific tensor shapes for development and testing
- Creating sharded safetensors files for large model testing
# Basic usage with local model directory path
python create_dummy_model.py --model_path /path/to/model/dir --output_dir /path/to/output
# Using a Hugging Face model ID directly (downloads necessary files automatically)
python create_dummy_model.py --model_path meta-llama/Llama-3-8B --output_dir ./llama3_dummy
# Create a smaller model with only 3 hidden layers
python create_dummy_model.py --model_path /path/to/model/dir --output_dir /path/to/output \
--config_override '{"num_hidden_layers": 3}'
# Create a model with custom weights from a JSON file
python create_dummy_model.py --model_path /path/to/model/dir --output_dir /path/to/output \
--weights_json example_weights.json
# Create a sharded model with custom weights
python create_dummy_model.py --model_path /path/to/model/dir --output_dir /path/to/output \
--weights_json example_weights.json --max_shard_size "500MB"
- Python 3.6+
- PyTorch
- Hugging Face Transformers
- huggingface_hub
- safetensors
--model_path
: Path to a model directory or Hugging Face model ID (e.g., 'meta-llama/Llama-3-8B')--output_dir
: Directory to save the model--config_override
: (Optional) JSON string with config parameters to override--weights_json
: (Optional) JSON file containing weights info (name, shape, dtype)--max_shard_size
: (Optional) Maximum size of each shard (e.g., '2GB', '500MB')
When using --weights_json
, the weights can be specified in several formats:
- Full format with shape and dtype specified:
"model.embed_tokens.weight": {
"shape": [151552, 5120],
"dtype": "float16"
}
- Simple format with just the shape as a list:
"model.layers.0.input_layernorm.weight": [5120]
- String format for shapes using 'x' as separator:
"model.layers.0.self_attn.q_proj.weight": "12288x5120"
Create safetensors files with specified tensor names and shapes. This is useful for:
- Creating dummy models with specific tensor shapes and dtypes
- Testing model loading and processing code without real weights
- Generating sharded model files for large model testing
- Creating placeholder weights for development and testing
# Basic usage with weights specified as a JSON string
python create_safetensors.py --weights_dict '{"model.layers.0.self_attn.q_proj.weight": [1024, 1024], "model.layers.0.self_attn.k_proj.weight": [1024, 1024]}'
# Using a JSON file containing weights information
python create_safetensors.py --weights_json example_weights.json --output_dir ./dummy_weights
# Creating sharded safetensors files for large models
python create_safetensors.py --weights_json example_weights.json --output_dir ./sharded_model --max_shard_size "2GB"
- Python 3.6+
- PyTorch
- safetensors
--output_dir
: Directory to save the safetensors file(s)--weights_json
: JSON file containing weights info (name, shape, dtype)--weights_dict
: JSON string with weights dictionary (name: shape)--max_shard_size
: Maximum size of each shard (e.g., '2GB', '500MB')
The weights can be specified in several formats as shown in the example_weights.json file:
- Full format with shape and dtype specified:
"model.embed_tokens.weight": {
"shape": [151552, 5120],
"dtype": "float16"
}
- Simple format with just the shape as a list:
"model.layers.0.input_layernorm.weight": [5120]
- String format for shapes using 'x' as separator:
"model.layers.0.self_attn.q_proj.weight": "12288x5120"
The repository includes an example weights JSON file (example_weights.json
) that demonstrates all supported formats for specifying tensor shapes and dtypes.