Releases: samtools/bcftools
1.22
Download the source code here: bcftools-1.22.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
Changes affecting the whole of bcftools, or multiple commands:
-
Add support for matching lines by ID via the
--pair-logicand--collapseoptions (#1739) -
The -i/-e filtering expressions
-
The expressions now properly match the regex negation of missing values, e.g.
-i 'TAG!~"\."'(#2355) -
Added support for Fisher's exact test
-
-
Add the option
-v, --verbosity INTto all bcftools commands and plugins. Verbosity values bigger than 3 are passed to the underlying HTSlib library so that the user can investigate network issues and other problems occurring at the library level.
Changes affecting specific commands:
-
bcftools annotate- Fix Number in the header definition of transferred FILTER and ID tags (#2335)
-
bcftools call- The
-s, --samplesoption was not working properly, now also supporting sample negation as advertised in the manual page, e.g.-s ^sample1,sample2to include all samples but sample1 and sample2 (#2380)
- The
-
bcftools consensus -
bcftools convert- The command
convert --gvcf2vcfwas not filling the REF allele when BCF was output (#243)
- The command
-
bcftools csq-
Check the input GFF for features outside transcript boundaries and extend the transcript to contain the feature fully (#2323)
-
Add experimental support for alternative genetic code tables, accessible via a new option
-C, --genetic-code(#2368) -
Change in the
--unify-chr-namesoption, no automatic sequence name modification is attempted anymore, the prefixes to trim must be given explictly. For example, if run with--unify-chr-names chr,Chromosome,-, the program will trim the "chr" prefix in the VCF, "Chromosome" in the GFF, leaving the fasta unchanged (#2378)
-
-
bcftools +fill-tags- Thanks to the extension of filtering expressions with Fisher's exact test, the plugin can now be used to add FT annotation (#1582)
-
bcftools merge-
Preserve phasing in half-missing genotypes (#2331)
-
The option
--merge noneis expected to create no new multiallelic sites, but it should allow to merge, say,A>CwithA>C,AT(#2333) -
Make
--merge bothwork with indel-only records; for example, the multiallelic siteG>GT,Tshould be merged withG>GT(#2339) -
Do not merge symbolic alleles unless they have not just the same type, eg.
<DEL>, but also length, i.e theINFO/ENDcoordinate (#2362) -
Fix a bug where an incorrectly formatted gVCF file with overlapping blocks would trigger an infinite loop in the program (#2410)
-
-
bcftools mpileup- The
-r/-Roption newly merge overlapping regions, preventing the output of duplicate sites
- The
-
bcftools norm -
plot-vcfstats- Make the option
-s, --sample-namesfunctional again (#2353)
- Make the option
-
bcftools +prune- New option to remove or annotate clusters of sites within a window
-
bcftools query- The functions used in
-i/-efiltering expressions (such asSUM,MEDIAN, etc) can be now used in formatting expressions (#2271). If the VCF containsINFO/ADandFORMAT/AD, try:bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t [ %sSUM(FMT/AD)]' bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t [ %SUM(FMT/AD)]' bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t %SUM(FMT/AD)' bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t %SUM(INFO/AD)' - Make it possible to refer to the ID column from the FORMAT expression (#2337)
bcftools query test.vcf -f 'ID=%ID ID=[ %/ID] vs FMT_ID=[ %ID]'
- The functions used in
-
bcftools roh- New visualization tool misc/roh-viz, see below
-
bcftools +setGT- Support for setting missing genotypes with arbitrary ploidy via
-n c:./.(#2303)
- Support for setting missing genotypes with arbitrary ploidy via
-
bcftools +split-vep- The
-s, --selectoption was extended to print only one consequence. Previously it was possible to select a single transcript (e.g., the one with the worst consequence), and it was possible to filter by consequence severity (e.g., missing or worse), but in some cases multiple consequences are reported within a single transcript (e.g., start_lost&splice_region). The extended option allows to print the worst part, for example as--select primary:missense+:worst
- The
-
bcftools +trio-dnm2-
Fix a problem with
--strictly-noveloption which would neglect the presence of the apparent de novo allele in the father for male offspring -
Fix a problem with uncalled mosaic chrX variants in males
-
-
roh-viz- HTML/JavaScript visualization of bcftools/roh output and homozygosity rate.
-
bcftools +vrfs- New experimental plugin for scoring variants and assess site noisiness (variant read frequency profiles) from a large number of unaffected parental samples
Download the source code here: bcftools-1.22.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
1.21
Download the source code here: bcftools-1.21.tar.bz2.
Changes affecting the whole of bcftools, or multiple commands:
-
Support multiple semicolon-separated strings when filtering by ID using
-i/-e(#2190). For example,-i 'ID="rs123"'now correctly matchesrs123;rs456 -
The filtering expression ILEN can be positive (insertion), negative (deletion), zero (balanced substitutions), or set to missing value (symbolic alleles).
-
bcftools query -
bcftools +split-vep- The columns indices printed by default with
-H(e.g.,#[1]CHROM) can be now suppressed by giving the option twice-HH(#2152)
- The columns indices printed by default with
Changes affecting specific commands:
-
bcftools annotate
- Support dynamic variables read from a tab-delimited annotation file (#2151) For example, in the two cases below the field
STRfrom the-afile is required to match theINFO/TAGin VCF. In the first example the allelesREF,ALTmust match, in the second example they are ignored. The option-kis required to output also records that were not annotated:
bcftools annotate -a ann.tsv.gz -c CHROM,POS,REF,ALT,SCORE,~STR -i'TAG={STR}' -k in.vcf bcftools annotate -a ann.tsv.gz -c CHROM,POS,-,-,SCORE,~STR -i'TAG={STR}' -k in.vcf- When adding
Type=Stringannotations from a tab-delimited file, encode characters with special meaning using percent encoding (;,=inINFOand:inFORMAT) (#2202)
- Support dynamic variables read from a tab-delimited annotation file (#2151) For example, in the two cases below the field
-
bcftools consensus-
Allow to apply a reference allele which overlaps a previous deletion, there is no need to complain about overlapping alleles in such case
-
Fix a bug which required
-s -to be present even when there were no samples in the VCF (#2260)
-
-
bcftools csq- Fix a rare bug where indel combined with a substitution ending at exon boundary is incorrectly predicted to have 'inframe' rather than 'frameshift' consequence (#2212)
-
bcftools gtcheck-
Fix a segfault with
--no-HWE-prob. The bug was introduced with the output format change in 1.19 which replaced theDCsection withDCv2(#2180) -
The number of matching genotypes in the
DCv2output was not calculated correctly with non-zero-E, --error-probability. Consequently, also the average HWE score was incorrect. The main output, the discordance score, was not affected by the bug
-
-
bcftools +mendelian2-
Include the number of good cases where at least one of the trio genotypes has an alternate allele (#2204)
-
Fix the error message which would report the wrong sample when non-existent sample is given. Note that bug only affected the error message, the program otherwise assigns the family members correctly (#2242)
-
-
bcftools merge- Fix a severe bug in merging of
FORMATfields withNumber=RandNumber=Avalues. For example, rows with high-coverageFORMAT/ADvalues (bigger or equal to 128) could have been assigned to incorrect samples. The bug was introduced in version 1.19. For details see #2244.
- Fix a severe bug in merging of
-
bcftools mpileup-
Return non-zero error code when the input BAM/CRAM file is truncated (#2177)
-
Add
FORMAT/ADannotation by default, disable with-a -AD
-
-
bcftools norm-
Support realignment of symbolic
<DUP.*>alleles, similarly to<DEL.*>added previously (#1919,#2145) -
Fix in reporting reference allele genotypes with
--multi-overlaps .(#2160) -
Support of duplicate removal of symbolic alleles of the same type but different
SVLEN(#2182) -
New
-S, --sortswitch to optionally sort output records by allele (#1484) -
Add the
-i/-efiltering options to select records for normalization. Note duplicate removal ignores this option. -
Fix a bug where
--atomizewould not fillGTalleles for atomized SNVs followed by an indel (#2239)
-
-
bcftools +remove-overlaps- Revamp the program to allow greater flexibility, with the following new options:
-M, --mark-tag TAG Mark -m sites with INFO/TAG
-m, --mark EXPR Mark (if also -M is present) or remove sites [overlap]
dup .. all overlapping sites
overlap .. overlapping sites
min(QUAL) .. mark sites with lowest QUAL until overlaps are resolved
--missing EXPR Value to use for missing tags with -m 'min(QUAL)'
0 .. the default
DP .. heuristics, scale maximum QUAL value proportionally to INFO/DP
--reverse Apply the reverse logic, for example preserve duplicates instead of removing
-O, --output-type t t: plain list of sites (chr,pos), tz: compressed list
-
bcftools +tag2tag-
The conversions
--LXX-to-XX,--XX-to-LXXwere working but specific cases such as--LAD-to-ADwere not. -
Print more informative error message when source tag type violiates VCF specification
-
-
bcftools +trio-dnm2- Better handling of the
--strictly-novelfunctionality, especically with respect to chrX inheritance
- Better handling of the
Download the source code here: bcftools-1.21.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
1.20
Download the source code here: bcftools-1.20.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
Changes affecting the whole of bcftools, or multiple commands:
- Add short option
-Wfor--write-index. The option now accepts an optional parameter which allows to choose between TBI and CSI index format.
Changes affecting specific commands:
-
bcftools consensus- Add new
--regions-overlapoption which allows to take into account overlapping deletions that start out of the fasta file target region.
- Add new
-
bcftools isec- Add new option
-l, --file-listto read the list of file names from a file
- Add new option
-
bcftools merge- Add new option
--force-singleto support single-file edge case (#2100)
- Add new option
-
bcftools mpileup- Add new option
--indels-cnsfor an alternative indel calling model, which should increase the speed on long read data (thanks to using edlib) and the precision (thanks to a number of heuristics).
- Add new option
-
bcftools norm-
Change the order of atomization and multiallelic splitting (when both
-a,-mare given) from "atomize first, then split" to "split first, then atomize". This usually results in a simpler VCF representation. The previous behaviour can be achieved by explicitly streaming the output of the--atomizecommand into the--multiallelicssplitting command. -
Fix
Type=Stringmultiallelic splitting forNumber=A,R,Gtags with incorrect number of values. -
Merging into multiallelic sites with
bcftools norm -m +indelsdid not work. This is now fixed and the merging is now more strict about variant types, for example complex events, such asAC>TGA, are not considered as indels anymore (#2084)
-
-
bcftools reheader- Allow reading the input file from a stream with
--fai(#2088)
- Allow reading the input file from a stream with
-
bcftools +setGT- Support for custom genotypes based on the allele with higher depth, such as
--new-gt c:0/Xcustom genotypes (#2065)
- Support for custom genotypes based on the allele with higher depth, such as
-
bcftools +split-vep-
When only one of the tags is present, automatically choose
INFO/BCSQ(the default tag name produced bybcftools csq) orINFO/CSQ(produced by VEP). When both tags are present, use the defaultINFO/CSQ. -
Transcript selection by
MANE,PICK, and user-defined transcripts, for example:--select CANONICAL=YES
--select MANE_SELECT!=""
--select PolyPhen~probably_damaging -
Select all matching transcripts via
--select, not just one -
Change automatic type parsing of VEP fields
DNA_position,CDS_position, andProtein_positionfrom Integer to String, as it can be of the form "8586-8599/9231". The type Integer can be still enforced with
-c cDNA_position:int,CDS_position:int,Protein_position:int. -
Recognize
-c field:str, not just-c field:string, as advertised in the usage page -
Fix a bug which made filtering expression containing missing values crash (#2098)
-
-
bcftools stats- When
GTis missing butADis present, the program determines the alternate allele fromAD. However, if theADtag has incorrect number of values, the program would exit with an error printing "Requested allele outside valid range". This is now fixed by taking into account the actual number ofALTalleles.
- When
-
bcftools +tag2tag- Support for conversion from tags using localized alleles (e.g.
LPL,LAD) to the family of standard tags (PL,AD)
- Support for conversion from tags using localized alleles (e.g.
-
bcftools +trio-dnm2- Extend
--strictly-novelto exclude cases where the non-Mendelian allele is the reference allele. The change is motivated by the observation that this class of variants is enriched for errors (especially for indels), and better corresponds with the option name.
- Extend
1.19
Download the source code here: bcftools-1.19.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
Changes affecting the whole of bcftools, or multiple commands:
- Filtering expressions can be given a file with list of strings to match, this was previously possible only for the ID column. For example
ID=@file .. selects lines with ID present in the file
INFO/[email protected] .. selects lines where TAG has a string value listed in the file
INFO/[email protected] .. TAG must not have a string value listed in the file
- Allow to query REF,ALT columns directly, for example
-e 'REF="N"'
Changes affecting specific commands:
-
bcftools annotate-
Fix
bcftools annotate --mark-sites, VCF sites overlapping regions in a BED file were not annotated (#1989) -
Add flexibility to
FILTERcolumn transfers and allow transfers within the same file, across files, and in combination. For examples see https://samtools.github.io/bcftools/howtos/annotate.html#transfer_filter_to_info
-
-
bcftools call-
Output
MIN_DPrather thanMinDPin gVCF mode -
New
-*, --keep-unseen-alleleoption to output the unobserved allele<*>, intended for gVCF.
-
-
bcftools head- New
-s, --samplesoption to include the#CHROMheader line with samples.
- New
-
bcftools gtcheck-
Add output options
-o, --outputand-O, --output-type -
Add filtering options
-i, --includeand-e, --exclude -
Rename the short option
-e, --error-probabilityfrom lower case to upper case-E, --error-probability -
Changes to the output format, replace the DC section with DCv2:
-
adds a new column for the number of matching genotypes
-
The
--error-probabilityis newly interpreted as the probability of erroneous allele rather than genotype. In other words, the calculation of the discordance score now considers the probability of genotyping error to be different for HOM and HET genotypes, i.e.P(0/1|dsg=0) > P(1/1|dsg=0). -
fixes in HWE score calculation plus output average HWE score rather than absolute HWE score
-
better description of fields
-
-
-
bcftools merge
- Add
-mmodifiers to suppress the output of the unseen allele<*>or<NON_REF>at variant sites (e.g.-m both,*) or all sites (e.g.-m both,**)
- Add
-
bcftools mpileup- Output
MIN_DPrather thanMinDPin gVCF mode
- Output
-
bcftools norm-
Add the number of joined lines to the summary output, for example
Lines total/split/joined/realigned/skipped: 6/0/3/0/0 -
Allow combining
-mand-awith--old-rec-tag(#2020) -
Symbolic
<DEL>alleles caused norm to expand REF to the full length of the deletion. This was not intended and problematic for long deletions, the REF allele should list one base only (#2029)
-
-
bcftools query-
Add new
-N, --disable-automatic-newlineoption for pre-1.18 query formatting behavior when newline would not be added when missing -
Make the automatic addition of the newline character in a more predictable way and, when missing, always put it at the end of the expression. In version 1.18 it could be added at the end of the expression (for per-site expressions) or inside the square brackets (for per-sample expressions). The new behavior is:
- if the formatting expression contains a newline character, do nothing
- if there is no newline character and
-N,--disable-automatic-newlineis given, do nothing - if there is no newline character and
-Nis not given, insert newline at the end of the expression
See #1969 for details
-
Add new
-F, --print-filteredoption to output a default string for samples that would otherwise be filtered by-i/-eexpressions. -
Include sample name in the output header with
-Hwhenever it makes sense (#1992)
-
-
bcftools +spit-vep-
Fix on the fly filtering involving numeric subfields, e.g.
-i 'MAX_AF<0.001'(#2039) -
Interpret default column type names (
--columns-types) as entire strings, rather than substrings to avoid unexpected spurious matches (i.e. internally add^and$to all field names)
-
-
bcftools +trio-dnm2
- Do not flag paternal genotyping errors as de novo mutations. Specifically, when father's chrX genotype is 0/1 and mother's 0/0, 0/1 in the child will not be marked as DNM.
-
bcftools view
- Add new
-A, --trim-unseen-alleleoption to remove the unseen allele<*>or<NON_REF>at variant sites (-A) or all sites (-AA)
- Add new
bcftools release 1.18:
Download the source code here: bcftools-1.18.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
Changes affecting the whole of bcftools, or multiple commands:
- Support auto indexing during writing BCF and VCF.gz via new
--write-indexoption
Changes affecting specific commands:
-
bcftools annotate -
bcftools concat- New option
--drop-genotypes
- New option
-
bcftools consensus-
Support higher-ploidy genotypes with
-H, --haplotype(#1892) -
Allow
--mark-insand--mark-snvwith a character, similarly to--mark-del
-
-
bcftools convert- Support for conversion from tab-delimited files (
CHROM,POS,REF,ALT) to sites-only VCFs
- Support for conversion from tab-delimited files (
-
bcftools csq-
New
--unify-chr-namesoption to automatically unify different chromosome naming conventions in the input GFF, fasta and VCF files (e.g. "chrX" vs "X") -
More versatility in parsing various flavors of GFF
-
A new
--dump-gffoption to help with debugging and investigating the internals of hGFF parsing -
When printing consequences in nonsense mediated decay transcripts, include 'NMD_transcript' in the consequence part of the annotation. This is to make filtering easier and analogous to VEP annotations. For example the consequence annotation
3_prime_utr|PCGF3|ENST00000430644|NMDis newly printed as3_prime_utr&NMD_transcript|PCGF3|ENST00000430644|NMD
-
-
bcftools gtcheck- Add stats for the number of sites matched in the GT-vs-GT, GT-vs-PL, etc modes. This information is important for interpretation of the discordance score, as only the GT-vs-GT matching can be interpreted as the number of mismatching genotypes.
-
bcftools +mendelian2- Fix in command line argument parsing, the
-pand-Poptions were not functioning (#1906)
- Fix in command line argument parsing, the
-
bcftools merge-
New
-M, --missing-rulesoption to control the behavior of merging of vector tags to prevent mixtures of known and missing values in tags when desired -
Use values pertaining to the unknown allele (
<*>or<NON_REF>) when available to prevent mixtures of known and missing values (#1888) -
Revamped line matching code to fix problems in gVCF merging where split gVCF blocks would not update genotypes (#1891, #1164).
-
-
bcftool mpileup- Fix a bug in
--indels-v2.0which caused an endless loop when CIGAR operatorHorPwas encountered
- Fix a bug in
-
bcftools norm -
bcftools query-
Force newline character in formatting expression when not given explicitly
-
Fix
-Hheader output in formatting expressions containing newlines
-
-
bcftools reheader- Make
-f, --faiaware of long contigs not representable by 32-bit integer (#1959)
- Make
-
bcftools +split-vep-
Prevent a segfault when
-i/-euse a VEP subfield not included in-for-c(#1877) -
New
-X, --keep-sitesoption complementing the existing-x, --drop-sitesoptions -
Force newline character in formatting expression when not given explicitly
-
Fix a subtle ambiguity: identical rows must be returned when
-sis applied regardless of-fcontaining the-aVEP tag itself or not.
-
-
bcftools stats-
Collect new VAF (variant allele frequency) statistics from
FORMAT/ADfield -
When counting transitions/transversions, consider also alternate het genotypes
-
-
plot-vcfstats- Add three new VAF plots
bcftools release 1.17:
Download the source code here: bcftools-1.17.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
Changes affecting the whole of bcftools, or multiple commands:
-
The
-i/-efiltering expressions-
Error checks were added to prevent incorrect use of vector arithmetics. For example, when evaluating the sum of two vectors A and B, the resulting vector could contain nonsense values when the input vectors were not of the same length. The fix introduces the following logic:
- evaluate to C_i = A_i + B_i when length(A)==B(A) and set length(C)=length(A)
- evaluate to C_i = A_i + B_0 when length(B)=1 and set length(C)=length(A)
- evaluate to C_i = A_0 + B_i when length(A)=1 and set length(C)=length(B)
- throw an error when length(A)!=length(B) AND length(A)!=1 AND length(B)!=1
-
Arrays in
Number=R tagscan be now subscripted by alleles found inFORMAT/GT. For example,
FORMAT/AD[GT] > 10.. require support of more than 10 reads for each allele
FORMAT/AD[0:GT] > 10.. same as above, but in the first sample
sSUM(FORMAT/AD[GT]) > 20.. require total sample depth bigger than 20
-
-
The commands
consensus -Hand+split-vep -H- Drop unnecessary leading space in the first header column and newly print
#[1]columnNameinstead of the previous# [1]columnName(#1856)
- Drop unnecessary leading space in the first header column and newly print
Changes affecting specific commands:
-
bcftools +allele-length- Fix overflow for indels longer than 512bp and aggregate alleles equal or larger than that in the same bin (#1837)
-
bcftools annotate -
bcftools call -
bcftools consensus- BREAKING CHANGE: the option
-I, --iupac-codesnewly outputs IUPAC codes based onFORMAT/GTof all samples. The-s, --samplesand-S, --samples-fileoptions can be used to subset samples. In order to ignore samples and consider only theREFandALTcolumns (the original behavior prior to 1.17), run with-s -(#1828)
- BREAKING CHANGE: the option
-
bcftools convert- Make variantkey conversion work for sites without an
ALTallele (#1806)
- Make variantkey conversion work for sites without an
-
bcftool csq-
Fix a bug where a MNV with multiple consequences (e.g. missense + stop_gained) would report only the less severe one (#1810)
-
GFF file parsing was made slightly more flexible, newly ids can be just
XXXrather than, for example,gene:XXX -
New
gff2gffperl script to fix GFF formatting differences
-
-
bcftools +fill-tags
- More of the available annotations are now added by the
-t alloption
- More of the available annotations are now added by the
-
bcftools +fixref
-
New
INFO/FIXREFannotation -
New
-mswap mode
-
-
bcftools +mendelian- The +mendelian plugin has been deprecated and replaced with +mendelian2. The function of the plugin is the same but the command line options and the output format has changed, and for this was introduced as a new plugin.
-
bcftools mpileup-
Most of the annotations generated by mpileup are now optional via the
-a, --annotateoption and add several new (mostly experimental) annotations. -
New option
--indels-2.0for an EXPERIMENTAL indel calling model. This model aims to address some known deficiencies of the current indel calling algorithm, specifically, it uses diploid reference consensus sequence. Note that in the current version it has the potential to increase sensitivity but at the cost of decreased specificity. -
Make the FS annotation (Fisher exact test strand bias) functional and remove it from the default annotations
-
-
bcftools norm-
New
--multi-overlapsoption allows to set overlapping alleles either to the ref allele (the current default) or to a missing allele (#1764 and #1802) -
Fixed a bug in
-m -which does not split missingFORMATvalues correctly and could lead to emptyFORMATfields such as::instead of the correct:.:(#1818) -
The
--atomizeoption previously would not split complex indels such asC>GGG. Newly these will be split into two recordsC>GandC>CGG(#1832)
-
-
bcftools query- Fix a rare bug where the printing of
SAMPLEfield withquerywas incorrectly suppressed when the-eoption contained a sample expression while the formatting query did not. See #1783 for details.
- Fix a rare bug where the printing of
-
bcftools +setGT -
bcftools +split-vep-
New options
-g, --gene-listand--gene-list-fieldswhich allow to prioritize consequences from a list of genes, or restrict output to the listed genes -
New
-H, --print-headeroption to print the header with-f -
Work around a bug in the LOFTEE VEP plugin used to annotate gnomAD VCFs. There the
LoF_infosubfield contains commas which, in general, makes it impossible to parse the VEP subfields. The+split-vepplugin can now work with such files, replacing the offending commas with slash (/) characters. See also Ensembl/ensembl-vep#1351 -
Newly the
-c, --columnsoption can be omitted when a subfield is used in-i/-efiltering expression. Note that-cmay still have to be given when it is not possible to infer the type of the subfield. Note that this is an experimental feature.
-
-
bcftools stats- The per-sample stats (PSC) would not be computed when
-i/-efiltering options and the-s -option were given but the expression did not include sample columns (1835)
- The per-sample stats (PSC) would not be computed when
-
bcftools +tag2tag- Revamp of the plugin to allow wider range of tag conversions, specifically all combinations from
FORMAT/GL,PL,GPtoFORMAT/GL,PL,GP,GT
- Revamp of the plugin to allow wider range of tag conversions, specifically all combinations from
-
bcftools +trio-dnm2-
New
-n, --strictly-noveloption to downplay alleles which violate Mendelian inheritance but are not novel -
Allow to set the
--pnand--pnsoptions separately for SNVs and indels and make the indel settings more strict by default -
Output missing
FORMAT/VAFvalues in non-trio samples, rather than random nonsense values
-
-
bcftools +variant-distance- New option
-d, --directionto choose the directionality: forward, reverse, nearest (the default) or both (#1829)
- New option
1.16
Download the source code here: bcftools-1.16.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
- New plugin
bcftools +variant-distanceto annotate records with distance to the nearest variant (#1690)
Changes affecting the whole of bcftools, or multiple commands:
-
The
-i/-efiltering expressions-
Added support for querying of multiple filters, for example
-i 'FILTER="A;B"'can be used to select sites with two filters "A" and "B" set. See the documentation for more examples. -
Added modulo arithmetic operator
-
Changes affecting specific commands:
-
bcftools annotate- A bug introduced in 1.14 caused that records with
INFO/ENDannotation would incorrectly trigger-c ~INFO/ENDmode of comparison even when not explicitly requested, which would result in not transferring the annotation from a tab-delimited file (#1733)
- A bug introduced in 1.14 caused that records with
-
bcftools merge- New
-m snp-ins-delswitch to merge SNVs, insertions and deletions separately (#1704)
- New
-
bcftools mpileup-
New
NMBZannotation for Mann-Whitney U-z test on number of mismatches within supporting reads -
Suppress the output of
MQSBZandFSannotations in absence of alternate allele
-
-
bcftools +scatter- Fix erroneous addition of duplicate
PGlines
- Fix erroneous addition of duplicate
-
bcftools +setGT- Custom genotypes (e.g.
-n c:1/1) now correctly override ploidy
- Custom genotypes (e.g.
1.15.1
Download the source code here: bcftools-1.15.1.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
-
bcftools annotate- New
-H, --header-lineconvenience option to pass a header line on command line, this complements the existing-h, --header-linesoption which requires a file with header lines
- New
-
bcftools csq- A list of consequence types supported by
bcftools csqhas been added to the manual page. (#1671)
- A list of consequence types supported by
-
bcftools +fill-tags-
Extend generalized functions so that
FORMATtags can be filled as well, for example:bcftools +fill-tags in.bcf -o out.bcf -- -t 'FORMAT/DP:1=int(smpl_sum(FORMAT/AD))' -
Allow multiple custom functions in a single run. Previously the program would silently go with the last one, assigning the same values to all (#1684)
-
-
bcftools norm-
Fix an assertion failure triggered when a faulty VCF file with a '-' character in the REF allele was used with
bcftools norm --atomize. This option now checks that the REF allele only includes the allowed characters A, C, G, T and N. (#1668) -
Fix the loss of phasing in half-missing genotypes in variant atomization (#1689)
-
-
bcftools roh- Fix a bug that could result in an endless loop or incorrect AF estimate when missing genotypes are present and the
--estimate-AF -option was used (#1687)
- Fix a bug that could result in an endless loop or incorrect AF estimate when missing genotypes are present and the
-
bcftools +split-vep- VEP fields with characters disallowed in VCF tag names by the specification (such as
-inM-CAP) couldn't be queried. This has been fixed, the program now sanitizes the field names, replacing invalid characters with underscore (#1686)
- VEP fields with characters disallowed in VCF tag names by the specification (such as
1.15
Download the source code here: bcftools-1.15.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
-
New
bcftools headsubcommand for conveniently displaying the headers of a VCF or BCF file. Without any options, this is equivalent tobcftools view --header-only --no-versionbut more succinct and memorable. -
The
-T, --targets-fileoption had the following bug originating in HTSlib code: when an uncompressed file with multiple columnsCHR,POS,REFwas provided, theREFwould be interpreted as 0 gigabases (#1598)
Changes affecting specific commands:
-
bcftools annotate-
In addition to
--rename-annots, which requires a file with name mappings, it is now possible to do the same on the command line-c NEW_TAG:=OLD_TAG -
Add new option
--min-overlapwhich allows to specify the minimum required overlap of intersecting regions -
Allow to transfer
ALTfrom VCF with or without replacement using:
bcftools annotate -a annots.vcf.gz -c ALT file.vcf.gz
bcftools annotate -a annots.vcf.gz -c +ALT file.vcf.gz
-
-
bcftools convert-
Revamp of
--gensample,--hapsampleand--haplegendsamplefamily of options which includes the following changes: -
New
--3N6option to output/input the new version of the.genfile format, see https://www.cog-genomics.org/plink/2.0/formats#gen -
Deprecate the
--chromoption in favor of--3N6. A simplecutcommand can be used to convert from the new3*M+6column format to the format printed with--chrom(cut -d' ' -f1,3-). -
The
CHROM:POS_REF_ALTIDs which are used to detect strand swaps are required and must appear either in the "SNP ID" column or the "rsID" column. The column is autodetected for--gensample2vcf, can be the first or the second for--hapsample2vcf(depending on whether the--vcf-idsoption is given), must be the first for--haplegendsample2vcf.
-
-
bcftools csq- Allow GFF files with phase column unset
-
bcftools filter- New
--mask,--mask-fileand--mask-overlapoptions to soft filter variants in regions (#1635)
- New
-
bcftools +fixref-
The
-m idoption now works also for non-dbSNP ids, i.e. not justrsINT -
New
-m flip-allmode for flipping all sites, including ambiguous A/T and C/G sites
-
-
bcftools isec- Prevent segfault on sites filtered with
-i/-ein all files (#1632)
- Prevent segfault on sites filtered with
-
bcftools mpileup-
More flexible read filtering using the options:
--ls,--skip-all-set.. skip reads with all of the FLAG bits set
--ns,--skip-any-set.. skip reads with any of the FLAG bits set
--lu,--skip-all-unset.. skip reads with all of the FLAG bits unset
--nu,--skip-any-unset.. skip reads with any of the FLAG bits unsetThe existing synonymous options will continue to function but their use is discouraged:
--rf,--incl-flagsSTR|INTRequired flags: skip reads with mask bits unset
--ff,--excl-flagsSTR|INTFilter flags: skip reads with mask bits set
-
-
bcftools query- Make the
--samplesand--samples-fileoptions work also in the--list-samplesmode. Add a new--force-samplesoption which allows to proceed even when some of the requested samples are not present in the VCF (#1631)
- Make the
-
bcftools +setGT- Fix a bug in
-t q -e EXPRlogic applied onFORMATfields, sites with all samples failing the expressionEXPRwere incorrectly skipped. This problem affected only the use of-elogic, not the-iexpressions (#1607)
- Fix a bug in
-
bcftools sort- make use of the
TMPDIRenvironment variable when defined
- make use of the
-
bcftools +trio-dnm2- The
--use-NAIVEmode now also adds the de novo allele inFORMAT/VA
- The
1.14
Download the source code here: bcftools-1.14.tar.bz2.(The "Source code" downloads are generated by GitHub and are incomplete as they don't bundle HTSlib and are missing some generated files.)
Changes affecting the whole of bcftools, or multiple commands
-
New
--regions-overlapand--targets-overlapoptions which address a long-standing design problem with subsetting VCF files by region. BCFtools recognize two sets of options, one for streaming (-t/-T) and one for index-gumping (-r/-R). They behave differently, the first includes only records with POS coordinate within the regions, the other includes overlapping regions. The two new options allow to modify the default behaviour, see the man page for more details. -
The
--output-typeoption can be used to override the default compression level
Changes affecting specific commands
-
bcftools annotate-
when
--set-idand--removeare combined,--set-idcannot use tags deleted by--remove. This is now detected and the program exists with an informative error message instead of segfaulting (#1540) -
while non-symbolic variation are uniquely identified by
POS,REF,ALT, symbolic alleles starting at the same position were indistinguishable. This prevented correct matching of records with the same positions and variant type but different length given byINFO/END(samtools/htslib@60977f2). When annotating from a VCF/BCF, the matching is done automatically. When annotating from a tab-delimited text file, this feature can be invoked by using-c INFO/END. -
add a new
.modifier to control whether missing values should be carried over from a tab-delimited file or not. For example:-c TAG ..addsTAGif the source value is not missing. IfTAGexists in the target file, it will be overwritten.
-c .TAG ..addsTAGeven if the source value is missing. This can overwrite non-missing values with a missing value and can create empty VCF fields (TAG=.)
-
-
bcftools +check-ploidy- by default missing genotypes are not used when determining ploidy. With the new option
-m, --use-missingit is possible to use the information carried in the missing and half-missing genotypes (e.g..,./.or./1)
- by default missing genotypes are not used when determining ploidy. With the new option
-
bcftools concat:- new
--ligate-forceand--ligate-warnoptions for finer control of-l, --ligatebehavior in imperfect overlaps. The new default is to throw an error when sites present in one chunk but absent in the other are encountered. To drop such sites and proceed, use the new--ligate-warnoption (previously this was the default). To keep such sites, use the new--ligate-forceoption (#1567).
- new
-
bcftools consensus:- Apply mask even when the VCF has no notion about the chromosome. It was possible to encounter this problem when
contiglines were not present in the VCF header and no variants were called on that chromosome (#1592)
- Apply mask even when the VCF has no notion about the chromosome. It was possible to encounter this problem when
-
bcftools +contrast:- support for chunking within map/reduce framework allowing to collect
NASSOCcounts even for empty case/control sample sets (#1566)
- support for chunking within map/reduce framework allowing to collect
-
bcftools csq:-
bug fix, compound indels were not recognised in some cases (#1536)
-
compound variants were incorrectly marked as 'inframe' even when stop codon would occur before the frame was restored (#1551)
-
bug fix,
FORMAT/BCSQbitmasks could have been assigned incorrectly to some samples at multiallelic sites, a superset of the correct consequences would have been set (#1539) -
bug fix, the upstream stop could be falsely assigned to all samples in a multi-sample VCF even if the stop was relevant for a single sample only (#1578)
-
further improve the detection of mismatching chromosome naming (e.g. "chrX" vs "X") in the GFF, VCF and fasta files
-
-
bcftools merge:- keep (sum)
INFO/AN,ACvalues when merging VCFs with no samples (#1394)
- keep (sum)
-
bcftools mpileup:- new
--indel-sizeoption which allows to increase the maximum considered indel size considered, large deletions in long read data are otherwise lost.
- new
-
bcftools norm:-
atomization now supports
Number=A,Rstring annotations (#1503) -
assign as many alternate alleles to genotypes at multiallelic sites in the
-m +mode, disregarding the phase. Previously the program assumed to be executed as an inverse operation of-m -, but when that was not the case, reference alleles would have been filled instead of multiple alternate alleles (#1542)
-
-
bcftools sort:- increase accuracy of the
--max-memoption limit, previously the limit could be exceeded by more than 20% (#1576)
- increase accuracy of the
-
bcftools +trio-dnm:- new
--with-pADoption to allow processing of VCFs without FORMAT/QS. The existing--pploption was changed to the analogous--with-pPL
- new
-
bcftools view:- the functionality of the option
--compression-levellost in 1.12 has been restored
- the functionality of the option