To run phastCons users must download the PHAST binaries or compile PHAST from source.
PHAST binaries can be downloaded by clicking the appropriate Windows, MacOSX or Linux icon on the PHAST website.
PHAST source can be downloaded by clicking the Source icon on the PHAST website or Phast Github.
For complete instructions on how to compile PHAST from source, please visit Quick Start - Installing PHAST.
A phylogenetic model in .mod format is required by phastCons. This can be generated using phyloFit from the PHAST package.
For example, phyloFit can fit a phylogenetic model (neutralmodel.mod) given an alignment file and a tree topology
phyloFit --tree "((galGal2,((rn3,mm5),fr1)),hg17)" --subst-mod REV --out-root modelfile alignment.maf
Detailed instructions for using phyloFit and information on various options can be found in the phyloFit Tutorial
Required input files
● An alignment file in one of the following formats - MAF, FASTA, PHYLIP, MPM, SS
● Phylogenetic models produced by phyloFit in .mod format.
The program can be run with a command of the form:
phastCons [OPTIONS] alignment m1.mod,m2.mod,... > scores.wig
Here we present commonly used options for running phyloFit.
Option | Description |
---|---|
--target-coverage | Constrain transition parameters such that the expected fraction of sites in conserved elements is <gamma> (0 < <gamma> < 1). This is a *prior* rather than *posterior* expectation and assumes stationarity of the state-transition process. Adding this constraint causes the ratio mu/nu to be fixed at (1-i<gamma>)/<gamma>. If used with --expected-length, the transition probabilities will be completely fixed; otherwise the expected-length parameter <omega> will be estimated by maximum likelihood. |
--expected-length |
Set transition probabilities such that the expected length of a conserved element is <omega>. Specifically, the parameter mu is set to 1/<omega>. If preceded by '~', <omega> will be estimated, but will be initialized to the specified value. For use with --target-coverage. |
--most-conserved |
Predict discrete elements using the Viterbi algorithm and write to specified file. Output is in BED format, unless output file name has suffix ".gff", in which case output is in GFF. |
--rho |
Set the *scale* (overall evolutionary rate) of the model for the conserved state to be <rho> times that of the model for the non-conserved state (0 < <rho> < 1; default 0.3).i If used with --estimate-trees or --estimate-rho, the specified value will be used for initialization only (rho will be estimated). This option is ignored if two tree models are given. |
--score |
Assign a log-odds score to each prediction. |
--estimate-trees |
Estimate free parameters of tree models and write new models to <fname_root>.cons.mod and <fname_root>.noncons.mod. |
--no-post-probs |
Suppress output of posterior probabilities. Useful if only discrete elements or likelihood is of interest. |
To estimate all the free parameters of the phylogenetic models by maximum likelihood, the --estimate-trees option can be used.
This will output the new models to mytrees.cons.mod and mytrees.noncons.mod.
phastCons --target-coverage 0.25 --expected-length 12 --estimate-trees mytrees primate-rodent.fa pri_rod.mod --no-post-probs
Input files:
Output files:
The transition probabilities for the HMM can either be specified at the command line or estimated from the data using an EM algorithm. Here we specify them at the command line, using the --target-coverage and --expected-length options.
phastCons --target-coverage 0.25 --expected-length 12 primate-rodent.fa mytrees.cons.mod,mytrees.noncons.mod > scores.wig
Input files:
Non-conserved model file (.mod)
Output files:
The output scores.wig is a file of conservation scores in the fixed-step WIG format.
Using the option --rho, we can define the conserved model as a scaled version of the nonconserved model. Here the scaling factor rho is 0.4.
phastCons --target-coverage 0.25 --expected-length 12 --rho 0.4 primate-rodent.fa mytrees.noncons.mod > scores.wig
Input files:
Non-conserved model file (.mod)
Output files:
Using the --most-conserved option, a set of discrete conserved elements can be predicted.
phastCons --target-coverage 0.25 --expected-length 12 --rho 0.4 --most-conserved most-cons.bed primate-rodent.fa mytrees.noncons.mod > scores.wig
Input files:
Non-conserved model file (.mod)
Output files:
Conserved elements file (.bed)
The output is a BED file with the coordinates of predicted elements.