A comprehensive machine learning toolkit for converting Label Studio annotations, training object detection models, and optimizing for deployment.
- Label Studio to YOLO Conversion: Convert Label Studio JSON exports to YOLO format
- Image Downloading: Download images from S3/HTTP sources with progress tracking
- YOLO Model Training: Train YOLOv11 models with automatic device detection
- ONNX Export & Optimization: Export and optimize models for mobile deployment
- Cross-Platform GPU Support: MPS (macOS), CUDA (NVIDIA), ROCm (AMD)
- Centralized Configuration: YAML-based configuration with environment variable support
- Automatic .env Loading: Seamless integration with .env files for sensitive credentials
- Environment Variable Substitution: Support for
${VAR_NAME}
and${VAR_NAME:-default}
syntax in YAML - Flexible Import System: Works both as a Python module and as standalone scripts
- Secure Configuration: Sensitive data in .env, regular settings in YAML
- Modern CLI Interface: Beautiful terminal output with progress indicators and status displays
- Smart NMS Configuration: Optimized Non-Maximum Suppression settings to reduce warnings
- Automatic Training Directory Detection: Finds the latest YOLO training output automatically
# Install package (includes GPU support for all platforms)
pip install ls-ml-toolkit
# PyTorch automatically detects and uses:
# - macOS: Metal Performance Shaders (MPS)
# - Linux: CUDA/ROCm (if available)
# - Windows: CUDA (if available)
# 1. Create .env file with your S3 credentials
cp env.example .env
# Edit .env with your AWS credentials
# 2. Train a model from Label Studio dataset
lsml-train dataset/v0.json --epochs 50 --batch 8 --device auto
# 3. Optimize an ONNX model
lsml-optimize model.onnx
# PyTorch automatically detects your platform and GPU
# All configuration is loaded automatically from .env and ls-ml-toolkit.yaml
from ls_ml_toolkit import LabelStudioToYOLOConverter, YOLOTrainer
# Convert dataset
converter = LabelStudioToYOLOConverter('dataset_name', 'path/to/labelstudio.json')
converter.process_dataset()
# Train model
trainer = YOLOTrainer('path/to/dataset')
trainer.train_model(epochs=50, device='auto')
Create a .env
file with your sensitive credentials only:
# S3 Credentials (Sensitive Data)
LS_ML_S3_ACCESS_KEY_ID=your_access_key
LS_ML_S3_SECRET_ACCESS_KEY=your_secret_key
# Optional: Environment-specific settings
LS_ML_S3_DEFAULT_REGION=us-east-1
LS_ML_S3_ENDPOINT=https://custom-s3.example.com
Important:
- Only use
.env
for sensitive data (API keys, passwords, tokens) - All other configuration should be in
ls-ml-toolkit.yaml
- Copy
env.example
to.env
and configure your credentials - The toolkit automatically loads these variables and makes them available throughout the application
All regular settings are configured in ls-ml-toolkit.yaml
. Environment variables are used only for sensitive data:
# Dataset Configuration
dataset:
base_dir: "dataset"
train_split: 0.8
val_split: 0.2
# Training Configuration
training:
epochs: 50
batch_size: 8
image_size: 640
device: "auto"
# NMS (Non-Maximum Suppression) settings
nms:
iou_threshold: 0.7 # IoU threshold for NMS (0.0-1.0) - higher = fewer detections
conf_threshold: 0.25 # Confidence threshold for predictions (0.0-1.0) - higher = fewer detections
max_det: 300 # Maximum number of detections per image - lower = faster processing
# Model Export Configuration
export:
model_path: "shared/models/layout_yolo_universal.onnx"
optimized_model_path: "shared/models/layout_yolo_universal_optimized.onnx" # Optional
optimize: true
optimization_level: "all"
# S3 Configuration (uses .env for sensitive data)
s3:
access_key_id: "${LS_ML_S3_ACCESS_KEY_ID}" # From .env file
secret_access_key: "${LS_ML_S3_SECRET_ACCESS_KEY}" # From .env file
region: "${LS_ML_S3_DEFAULT_REGION:-us-east-1}" # From .env file with default
endpoint: "${LS_ML_S3_ENDPOINT:-}" # From .env file (optional)
# Platform-specific settings
platform:
auto_detect_gpu: true
force_device: null
macos:
device: "mps"
batch_size: 16
linux:
device: "auto" # PyTorch will auto-detect GPU
batch_size: 16
- MPS Support: Automatic Metal Performance Shaders detection
- Installation:
pip install ls-ml-toolkit
- CUDA Support: Automatic NVIDIA GPU detection and configuration
- ROCm Support: Automatic AMD GPU detection
- Installation:
pip install ls-ml-toolkit
- Requirements: NVIDIA drivers + CUDA toolkit OR ROCm drivers
- CUDA Support: Automatic NVIDIA GPU detection
- Installation:
pip install ls-ml-toolkit
- Requirements: NVIDIA drivers + CUDA toolkit
git clone https://github.com/bavix/ls-ml-toolkit.git
cd ls-ml-toolkit
pip install -e .
pip install -r requirements-dev.txt
pytest tests/
# Build package
python -m build
# Install in development mode
pip install -e .
lsml-train
: Train YOLO models from Label Studio datasetslsml-optimize
: Optimize ONNX models for deployment
- Beautiful Interface: Modern terminal UI with colors, icons, and progress indicators
- Status Tracking: Real-time progress updates during training and optimization
- Configuration Display: Shows current settings in a formatted table
- File Tree Display: Visual representation of training results and file structure
- Error Handling: Clear error messages and troubleshooting guidance
# Method 1: Use .env file (recommended for secrets)
echo "LS_ML_S3_ACCESS_KEY_ID=your_key" >> .env
echo "LS_ML_S3_SECRET_ACCESS_KEY=your_secret" >> .env
# Method 2: Use environment variables
export LS_ML_S3_ACCESS_KEY_ID="your_key"
export LS_ML_S3_SECRET_ACCESS_KEY="your_secret"
# Train with custom settings
lsml-train dataset/v0.json \
--epochs 100 \
--batch 16 \
--device mps \
--imgsz 640 \
--optimize \
--force-download
# Use custom YAML configuration
lsml-train dataset/v0.json --config custom-config.yaml
# Override specific settings via command line
lsml-train dataset/v0.json --epochs 100 --batch 16 --device mps
# Force re-download of existing images
lsml-train dataset/v0.json --force-download
# Train with custom NMS settings (via YAML config)
# Edit ls-ml-toolkit.yaml:
# training:
# nms:
# iou_threshold: 0.8
# conf_threshold: 0.3
# max_det: 200
# Optimize existing ONNX model
lsml-optimize model.onnx --level extended
# Use custom output path for optimization
lsml-optimize model.onnx --output optimized_model.onnx
# 1. Clone and install
git clone https://github.com/bavix/ls-ml-toolkit.git
cd ls-ml-toolkit
pip install -e .
# 2. Setup credentials
cp env.example .env
# Edit .env with your AWS credentials
# 3. Train your model
lsml-train your_dataset.json --epochs 50 --batch 8
The YAML configuration supports environment variable substitution only for sensitive data:
# S3 Configuration (uses .env variables)
s3:
access_key_id: "${LS_ML_S3_ACCESS_KEY_ID}" # From .env file
secret_access_key: "${LS_ML_S3_SECRET_ACCESS_KEY}" # From .env file
region: "${LS_ML_S3_DEFAULT_REGION:-us-east-1}" # From .env with default
endpoint: "${LS_ML_S3_ENDPOINT:-}" # From .env (optional)
# Regular configuration (no env vars needed)
training:
epochs: 50
batch_size: 8
image_size: 640
Naming Convention: LS_ML_<CATEGORY>_<SETTING>
LS_ML_S3_ACCESS_KEY_ID
- S3 credentialsLS_ML_S3_SECRET_ACCESS_KEY
- S3 credentialsLS_ML_S3_DEFAULT_REGION
- S3 configurationLS_ML_S3_ENDPOINT
- S3 endpoint
- API Keys & Secrets:
LS_ML_S3_ACCESS_KEY_ID
,LS_ML_S3_SECRET_ACCESS_KEY
- Environment-specific settings:
LS_ML_S3_DEFAULT_REGION
,LS_ML_S3_ENDPOINT
- Values that change between deployments
- Regular configuration: epochs, batch_size, image_size
- Default values: model paths, directory structures
- Platform settings: device detection, optimization levels
model_path
: Path for the regular ONNX export (required)optimized_model_path
: Path for the optimized ONNX model (optional)
If optimized_model_path
is not specified in the configuration:
- Training script: Uses
{model_path}_optimized.onnx
as fallback - Optimization script: Uses
{input_model}_optimized.onnx
as fallback
export:
model_path: "models/my_model.onnx"
optimized_model_path: "models/my_model_optimized.onnx" # Optional
optimize: true
optimization_level: "all"
- All non-sensitive settings
- Never commit
.env
files to version control - Use
.env.example
as a template - Keep sensitive data separate from code
ls-ml-toolkit/
βββ src/
β βββ ls_ml_toolkit/ # Main package source
β βββ __init__.py
β βββ train.py # Main training script
β βββ config_loader.py # Configuration management with .env support
β βββ env_loader.py # Environment variable loader
β βββ optimize_onnx.py # ONNX optimization
β βββ ui.py # CLI UI components
βββ tests/ # Test files
βββ requirements.txt # Dependencies
βββ pyproject.toml # Package configuration
βββ setup.py # Setup script
βββ ls-ml-toolkit.yaml # Main configuration with env var substitution
βββ env.example # Environment template
βββ .env # Your environment variables (create from .env.example)
βββ README.md # This file
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
If you see WARNING β οΈ NMS time limit 2.800s exceeded
:
What it means:
- NMS (Non-Maximum Suppression) operation is taking too long
- This can slow down validation and inference
- Usually happens with many objects or suboptimal settings
How to fix:
-
Optimize NMS settings in
ls-ml-toolkit.yaml
:training: nms: iou_threshold: 0.8 # Higher = fewer detections (0.7-0.9) conf_threshold: 0.3 # Higher = fewer detections (0.25-0.5) max_det: 200 # Lower = fewer detections (100-300)
-
Reduce batch size if memory allows:
training: batch_size: 4 # Reduce from 8 to 4
-
Optimize other parameters: Focus on
iou_threshold
,conf_threshold
, andmax_det
for better performance
If your .env
file is not being loaded:
- Check file location: Ensure
.env
is in the project root directory - Verify file format: Use
KEY=value
format (no spaces around=
) - Check permissions: Ensure the file is readable
- Copy from template: Use
cp env.example .env
as a starting point - Check naming: Use exact variable names like
LS_ML_S3_ACCESS_KEY_ID
If environment variables are not substituted in YAML:
- Check variable names: Use exact names like
LS_ML_S3_ACCESS_KEY_ID
- Verify syntax: Use
${VAR_NAME}
or${VAR_NAME:-default}
format - Test loading: Run
python -c "from ls_ml_toolkit.config_loader import ConfigLoader; print(ConfigLoader().get_s3_config())"
- Remember: Only use env vars for sensitive data, not regular config
If you get import errors when running scripts:
- Install in development mode:
pip install -e .
- Check Python path: Ensure the package is in your Python path
- Use absolute imports: The toolkit supports both relative and absolute imports
If the script can't find the latest training directory:
- Check YOLO output: Ensure
runs/detect/
directory exists - Verify permissions: Make sure the script can read the directory
- Manual path: The script automatically finds the latest
train*
directory
If ONNX optimization fails:
- Install dependencies:
pip install onnx onnxruntime
- Check model format: Ensure input is a valid ONNX model
- Use fallback: The script will use default naming if config path is missing
This project is licensed under the MIT License - see the LICENSE file for details.