BSNP: A Bayesian SNP Caller



BSNP is a Bayesian SNP caller that takes in a file with piled-up short reads and produces for every covered position, a probability for each of the ten possible diploid genotypes (A/A, C/C, G/G, T/T, A/G, A/C, A/T, C/B, C/T, G/T). BSNP provides posterior genotype probabilities P(Genotype|Data) as well as joint probabilities with the data P(Genotype , Data). BSNP uses a similar strategy to the one employed by SOAP (Li et al. Genome Res 2010), but is open source, more flexible and provides direct control of the prior probabilities.

BSNP is specifically designed to avoid any biases from the reference sequence, other than the inevitable bias caused by the alignment of reads to the reference. In other words, the prior assumed by BSNP when producing a posterior probability over genotypes at a given site does not consider the reference sequence at that site. In addition, BSNP accommodates the distinct error models of the different technologies commonly used for nextgen sequencing, namely: Illumina, 454, SOLiD and Sanger sequencing (experimental). BSNP also uses a model for corrleated errors across reads to mitigate the effect of errors introduced prior to amplification (a similar model is used by MAQ and SOAP).


More information on BSNP can be found in Section 1 of the supplement to our paper, and in the BSNP user manual.

Downloads (Last updated on May 19, 2012)

Source code tar, binaries, docs & examples for Linux and Windows BSNP_v2.17.02.tar.bz2
Direct link to documentation (pdf) BSNP_v2.17.02.pdf

Install

BSNP is implemented in C++, developed and tested on Linux and Windows (Windows 7 command line, but should run on any version of Win32). For Linux users, we provide source code and simple build instructions, and for Windows we provide Visual Studio Project files and Win32 binary executables. While this has not been tested on macintosh, compilation should be as simple as gcc *.cpp, the portable source has no external dependencies outside of the standard C/C++ libraries.

Linux

  1. Download the BSNP source code

  2. Unzip the downloaded file
    bzcat BSNP_v.tar.bz2 | tar -x
  3. Compile BSNP
    make
  4. Done! For usage type:
    bin/BSNP

Windows

See documentation for build instruction, or use provided Win32 executables.


Useful Links

  G-PhoCS Generalized Phylogenetic Coalescent Sampler for demographic analysis.
  Data Sequence data from seven individual human genomes generated by BSNP, and used in the demographic analysis of Gronau et al. Nat Genet 2011.

Cite

Gronau I, Hubisz MJ, Gulko B, Danko CG, Siepel A.   Bayesian inference of ancient human demography from individual genome sequences.  Nature Genetics 43 1031–1034.   2011


Contact

Problems, questions, feature requests should be directed to bgulko@cshl.edu