Skip to content

luccafong/oss-llm-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

oss-llm-tools

OSS LLM Tools for Conversion, Evaluation, Numerical Debugging and Benchmarking

lm-eval bisect tools

Find first bad commit that dropped the accuracy of a model.

cd <target repo>
python ../oss-llm-tools/bisect_accuracy.py --good <Good Commit> --bad <Bad Commit> --model google/gemma-3-12b-it --task gsm8k --target 0.4 --limit 100 --bisect_log_file --model_args '{"tensor_parallel_size": 4}'  --eval_args '{"num_fewshot":5}' --stop_with_exception --bisect_log /tmp/bisect.log

create_dummy_model

Create a dummy transformer model with optional custom weights. This script combines model initialization from a config file or Hugging Face model with the ability to add or update specific tensors. This is useful for:

  • Testing code that requires a model with a specific architecture without needing the actual trained weights
  • Creating smaller test models by overriding parameters like number of layers
  • Generating placeholder models with specific tensor shapes for development and testing
  • Creating sharded safetensors files for large model testing

Usage

# Basic usage with local model directory path
python create_dummy_model.py --model_path /path/to/model/dir --output_dir /path/to/output

# Using a Hugging Face model ID directly (downloads necessary files automatically)
python create_dummy_model.py --model_path meta-llama/Llama-3-8B --output_dir ./llama3_dummy

# Create a smaller model with only 3 hidden layers
python create_dummy_model.py --model_path /path/to/model/dir --output_dir /path/to/output \
  --config_override '{"num_hidden_layers": 3}'

# Create a model with custom weights from a JSON file
python create_dummy_model.py --model_path /path/to/model/dir --output_dir /path/to/output \
  --weights_json example_weights.json

# Create a sharded model with custom weights
python create_dummy_model.py --model_path /path/to/model/dir --output_dir /path/to/output \
  --weights_json example_weights.json --max_shard_size "500MB"

Requirements

  • Python 3.6+
  • PyTorch
  • Hugging Face Transformers
  • huggingface_hub
  • safetensors

Parameters

  • --model_path: Path to a model directory or Hugging Face model ID (e.g., 'meta-llama/Llama-3-8B')
  • --output_dir: Directory to save the model
  • --config_override: (Optional) JSON string with config parameters to override
  • --weights_json: (Optional) JSON file containing weights info (name, shape, dtype)
  • --max_shard_size: (Optional) Maximum size of each shard (e.g., '2GB', '500MB')

Weights Format

When using --weights_json, the weights can be specified in several formats:

  1. Full format with shape and dtype specified:
"model.embed_tokens.weight": {
  "shape": [151552, 5120],
  "dtype": "float16"
}
  1. Simple format with just the shape as a list:
"model.layers.0.input_layernorm.weight": [5120]
  1. String format for shapes using 'x' as separator:
"model.layers.0.self_attn.q_proj.weight": "12288x5120"

create_safetensors

Create safetensors files with specified tensor names and shapes. This is useful for:

  • Creating dummy models with specific tensor shapes and dtypes
  • Testing model loading and processing code without real weights
  • Generating sharded model files for large model testing
  • Creating placeholder weights for development and testing

Usage

# Basic usage with weights specified as a JSON string
python create_safetensors.py --weights_dict '{"model.layers.0.self_attn.q_proj.weight": [1024, 1024], "model.layers.0.self_attn.k_proj.weight": [1024, 1024]}'

# Using a JSON file containing weights information
python create_safetensors.py --weights_json example_weights.json --output_dir ./dummy_weights

# Creating sharded safetensors files for large models
python create_safetensors.py --weights_json example_weights.json --output_dir ./sharded_model --max_shard_size "2GB"

Requirements

  • Python 3.6+
  • PyTorch
  • safetensors

Parameters

  • --output_dir: Directory to save the safetensors file(s)
  • --weights_json: JSON file containing weights info (name, shape, dtype)
  • --weights_dict: JSON string with weights dictionary (name: shape)
  • --max_shard_size: Maximum size of each shard (e.g., '2GB', '500MB')

Weights Format

The weights can be specified in several formats as shown in the example_weights.json file:

  1. Full format with shape and dtype specified:
"model.embed_tokens.weight": {
  "shape": [151552, 5120],
  "dtype": "float16"
}
  1. Simple format with just the shape as a list:
"model.layers.0.input_layernorm.weight": [5120]
  1. String format for shapes using 'x' as separator:
"model.layers.0.self_attn.q_proj.weight": "12288x5120"

The repository includes an example weights JSON file (example_weights.json) that demonstrates all supported formats for specifying tensor shapes and dtypes.

About

OSS LLM Tools for Conversion Eval Numerical Degging and Benchmarking

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages