PHAST logo

Navigation

Home / Download
FAQ
Comparison
Background & References
Plans

RPHAST

Siepel Lab

Major Programs

phastCons: Conservation scoring and identification of conserved elements

phastOdds: Log-odds scoring for phylogenetic models or phylo-HMMs

phyloFit: Fitting of phylogenetic models to aligned DNA sequences

phyloP: Computation of p-values for conservation or acceleration, either lineage-specific or across all branches

exoniphy: Phylogenetic exon prediction

dless: Prediction of elements under lineage-specific selection

prequel: Probabilistic reconstruction of ancestral sequences

phastBias:Identification of GC-biased gene conversion using a phylo-HMM

Utilities

Alignments: maf_parse, msa_view, msa_split, msa_diff

Phylogenetic models: tree_doctor, all_dists, draw_tree, consEntropy, indelFit, indelHistory

Sampling/bootstrapping: base_evolve, phyloBoot

Annotations: refeature, clean_genes, eval_predictions

...and others

Contact

Send feedback to: phasthelp@cshl.edu

Frequently Asked Questions


Q. Where does the name "PHAST" come from?

A. The name "PHAST" arose because several programs in the package (including phastCons, exoniphy, and dless) make use of phylogenetic hidden Markov models (phylo-HMMs). Phylo-HMMs have been called "space/time models" because they describe DNA sequences by two Markov processes — one that operates in the dimension of time (along the branches of an evolutionary tree) and one that operates in space (along the sequences themselves) (see Yang, 1995).


Q. How do I do X with program Y?

A. Most of the programs in the package have fairly detailed help pages, with examples. The help pages are available from these web pages (follow links in left panel) or by running each program with the --help (-h) option. If they do not contain the information you need, please send email to the phast help mailing list.


Q. Is PHAST freely available? Am I allowed to reuse and/or redistribute the source code?

A. Yes. PHAST has always been freely available to academics, but it is now officially open source and available under the terms of a BSD-style license.


Q. What's the difference between PHAST and PAML / GERP / N-SCAN / etc.?

A. We have attempted to summarize similarities and differences of PHAST with respect to several related programs on the Comparison page.


Q. Which program in PHAST should I use for conservation scoring?

A. Both phastCons and phyloP (with --wig-scores or --base-by-base) can be used to produce conservation scores, and which one is best depends on the application. The most important difference between these two programs is that the scores produced by phyloP reflect individual alignment columns, and do not take into account conservation at neighboring sites. This is why the phyloP conservation plot in the UCSC Browser has a less smooth appearance, with more "texture" at individual bases, than the phastCons plot. This property also makes phyloP more appropriate than phastCons for evaluating signatures of selection at particular bases or classes of bases in the genome (e.g., all third codon positions). In addition, phyloP requires fewer assumptions than phastCons, by depending only on a model of neutral evolution, rather than on models of both neutral evolution and negative selection (conservation). On the other hand, because it directly models multibase elements, phastCons may be preferred as a conserved element detector. Its ability to pool information across sites can also be valuable in cases of few species or short branch lengths, where there may be insufficient data to detect selection separately at each site.


Q. What exactly is {SS, *.cm, *.mod} format?

A. SS is a format used by PHAST to describe a multiple alignment in terms of its "sufficient statistics" for phylogenetic analysis — i.e., its distinct alignment columns and their counts, and optionally, the order in which the columns appear (which is typically needed for functional element identification but not for phylogenetic analysis). A *.cm file defines a "category map," i.e., a mapping from feature types (e.g., "exon", "ancestral repeat", "conserved element") to label numbers. These turn out to be useful in several HMM-based "parsing" applications. A *.mod file defines the parameters of a probalistic phylogenetic model, including a tree and branch lengths, a substitution rate matrix, and a background distribution over bases. Detailed specifications of these file formats will be made available as the PHAST documentation is improved. In the meantime, examples of SS and *.mod files can be obtained by converting or generating files with msa_view or phyloFit, respectively. Example *.cm files are included in the phast/data/exoniphy directory.


Q. What is {genepred, bed, MAF} format?

A. These are file formats developed for the UCSC Genome Browser and associated applications. Specifications can be found here. The GFF and GTF formats (also recognized by PHAST) are also described on this page.