This repository provides a script to convert Alamut .mut XML files or .db SQLite files to VCF or TSV formats, consolidating all mutations into a single output file. The script supports coordinate conversion to GRCh38 and includes flexible output options.
- Converts Alamut .mut XML files or .db SQLite files to VCF, TSV, or managed_variants (Scout-compatible) formats
- Supports coordinate systems: GRCh36, GRCh37 (hg19), GRCh38, and LRG
- Converts all coordinates to GRCh38 using chain files and LRG BED files
- Includes rich INFO/column fields in output (pathogenicity, classification, comments, etc.)
- Optionally includes or omits the 'chr' prefix in chromosome names
- Python 3.7+
- pyliftover
- pyfaidx
- beautifulsoup4
Install dependencies with:
python alamut-parser.py \
-i <input_dir_or_db> \
-o <output_file> \
-d <chain_files_dir> \
-f <format> [options]
-i
,--input
Input directory with .mut files or path to .db SQLite file-o
,--outfile
Output file path-d
,--chain-files-dir
Directory with chain files and LRG mapping files-f
,--format
Output format:vcf
,tsv
, ormanaged_variants
--include-comments
Include comments in output--fasta
Path to target genome FASTA file (for sequence context)--target-genome
Target genome assembly (default: GRCh38)--institute
Institute name (for managed_variants)--cancer
Specify if cancer variants (for managed_variants)--chr-prefix
Include 'chr' prefix in chromosome names-v
,--verbose
Enable verbose logging
python alamut-parser.py \
-i ./mut_files/ \
-o output.vcf \
-d ./chainfiles/ \
-f vcf --include-comments --chr-prefix
- Chain files and LRG BED files must be present in the specified directory.
- For managed_variants output, the file is compatible with Scout.
MIT