phastCons Tutorial

A command-line program within PHAST for identifying evolutionarily conserved elements in a multiple sequence alignment based on a phylogenetic hidden Markov model (phylo-HMM) when given a phylogenetic tree.

phastCons Tutorial phastCons Tutorial
NOTE: This is a basic tutorial for getting started with phastCons. For a more specialized tutorial with extended usage and options please visit phastCons HOWTO.
Contents

Download and compile PHAST


To run phastCons users must download the PHAST binaries or compile PHAST from source.


        PHAST binaries can be downloaded by clicking the appropriate Windows, MacOSX or Linux icon on the PHAST website.

        PHAST source can be downloaded by clicking the Source icon on the PHAST website or Phast Github.

        For complete instructions on how to compile PHAST from source, please visit Quick Start - Installing PHAST.


Generate neutral model file (phyloFit)


A phylogenetic model in .mod format is required by phastCons. This can be generated using phyloFit from the PHAST package.

For example, phyloFit can fit a phylogenetic model (neutralmodel.mod) given an alignment file and a tree topology

     phyloFit --tree "((galGal2,((rn3,mm5),fr1)),hg17)" --subst-mod REV --out-root modelfile alignment.maf

Detailed instructions for using phyloFit and information on various options can be found in the phyloFit Tutorial


Usage and options for phastCons


Required input files

● An alignment file in one of the following formats - MAF, FASTA, PHYLIP, MPM, SS

● Phylogenetic models produced by phyloFit in .mod format.

The program can be run with a command of the form:

    phastCons [OPTIONS] alignment m1.mod,m2.mod,... > scores.wig


Here we present commonly used options for running phyloFit.

Option Description
--target-coverage

Constrain transition parameters such that the expected fraction of sites in conserved elements is <gamma> (0 < <gamma> < 1).

This is a *prior* rather than *posterior* expectation and assumes stationarity of the state-transition process. Adding this constraint causes the ratio mu/nu to be fixed at (1-i<gamma>)/<gamma>.

If used with --expected-length, the transition probabilities will be completely fixed; otherwise the expected-length parameter <omega> will be estimated by maximum likelihood.

--expected-length

Set transition probabilities such that the expected length of a conserved element is <omega>. Specifically, the parameter mu is set to 1/<omega>.

If preceded by '~', <omega> will be estimated, but will be initialized to the specified value.

For use with --target-coverage.

--most-conserved

Predict discrete elements using the Viterbi algorithm and write to specified file. Output is in BED format, unless output file name has suffix ".gff", in which case output is in GFF.

--rho

Set the *scale* (overall evolutionary rate) of the model for the conserved state to be <rho> times that of the model for the non-conserved state (0 < <rho> < 1; default 0.3).i

If used with --estimate-trees or --estimate-rho, the specified value will be used for initialization only (rho will be estimated).

This option is ignored if two tree models are given.

--score

Assign a log-odds score to each prediction.

--estimate-trees

Estimate free parameters of tree models and write new models to <fname_root>.cons.mod and <fname_root>.noncons.mod.

--no-post-probs

Suppress output of posterior probabilities. Useful if only discrete elements or likelihood is of interest.

Examples


Example 1 - Run phastCons with an alignment and mod file to estimate free parameters of the phylogenetic models.


To estimate all the free parameters of the phylogenetic models by maximum likelihood, the --estimate-trees option can be used.

This will output the new models to mytrees.cons.mod and mytrees.noncons.mod.

    phastCons --target-coverage 0.25 --expected-length 12 --estimate-trees mytrees primate-rodent.fa pri_rod.mod --no-post-probs


Input files:

    Alignment file (FASTA)

    Model file (.mod)

Output files:

    Conserved model file (.mod)

    Non-conserved model file (.mod)

Example 2 - Run phastCons with the target-coverage and expected length option to specify transition probabilities.


The transition probabilities for the HMM can either be specified at the command line or estimated from the data using an EM algorithm. Here we specify them at the command line, using the --target-coverage and --expected-length options.

    phastCons --target-coverage 0.25 --expected-length 12 primate-rodent.fa mytrees.cons.mod,mytrees.noncons.mod > scores.wig


Input files:

    Alignment file (FASTA)

    Conserved model file (.mod)

    Non-conserved model file (.mod)

Output files:

    Output wig file (.wig)

The output scores.wig is a file of conservation scores in the fixed-step WIG format.

Example 3 - Run phastCons with the rho option.


Using the option --rho, we can define the conserved model as a scaled version of the nonconserved model. Here the scaling factor rho is 0.4.

     phastCons --target-coverage 0.25 --expected-length 12 --rho 0.4 primate-rodent.fa mytrees.noncons.mod > scores.wig


Input files:

    Alignment file (FASTA)

    Non-conserved model file (.mod)

Output files:

    Output wig file (.wig)

Example 4 - Run phastCons to predict conserved elements.


Using the --most-conserved option, a set of discrete conserved elements can be predicted.

     phastCons --target-coverage 0.25 --expected-length 12 --rho 0.4 --most-conserved most-cons.bed primate-rodent.fa mytrees.noncons.mod > scores.wig


Input files:

    Alignment file (FASTA)

    Non-conserved model file (.mod)

Output files:

    Output wig file (.wig)

    Conserved elements file (.bed)

The output is a BED file with the coordinates of predicted elements.