PHAST logo

Navigation

Home / Download
FAQ
Comparison
Background & References
Plans

RPHAST

Siepel Lab

Major Programs

phastCons: Conservation scoring and identification of conserved elements

phastOdds: Log-odds scoring for phylogenetic models or phylo-HMMs

phyloFit: Fitting of phylogenetic models to aligned DNA sequences

phyloP: Computation of p-values for conservation or acceleration, either lineage-specific or across all branches

exoniphy: Phylogenetic exon prediction

dless: Prediction of elements under lineage-specific selection

prequel: Probabilistic reconstruction of ancestral sequences

phastBias:Identification of GC-biased gene conversion using a phylo-HMM

Utilities

Alignments: maf_parse, msa_view, msa_split, msa_diff

Phylogenetic models: tree_doctor, all_dists, draw_tree, consEntropy, indelFit, indelHistory

Sampling/bootstrapping: base_evolve, phyloBoot

Annotations: refeature, clean_genes, eval_predictions

...and others

Contact

Send feedback to: phasthelp@cshl.edu

Comparison with Other Packages

PHAST essentially straddles two types of applications — phylogenetic modeling and functional element identification. In the realm of phylogenetic modeling, it is most similar to PAML. Like PAML, PHAST supports several different nucleotide substitution models and is optimized for fitting models to (potentially large) data sets conditional on a given tree topology, rather than for topology estimation (as are MrBayes, PhyML, RAxML, PHYLIP, and many other packages). However, PAML has very broad functionality for phylogenetic modeling, and several of its models and applications are not supported in PHAST. For example, PHAST does not support Goldman-Yang codon models, Yang-Nielsen likelihood ratio tests for positive selection, or amino acid models such as JTT or WAG, nor does it not support some of PAML's less commonly used nucleotide substitution models (e.g., F84, T92, and TN93). Nevertheless, PHAST has some features not found in PAML, such as support for general dinucleotide and trinucleotide substitution models (in phyloFit), parameter estimation by expectation maximization (also in phyloFit; this can improve performance with richly parameterized models), and phylogenetic p-value computation (in phyloP). In addition, PHAST is designed to "play well" with large-scale comparative genomic data sets (such as MAF files from UCSC). It also has a more standard command-line interface than PAML, which some users find convenient.

In the realm of functional element identification, PHAST does not have a direct counterpart, but it has some overlap with several other packages. With regard to conservation scoring and the identification of conserved elements, it is probably closest to GERP. (Related programs include binCons and SCONE.) GERP is primarily a column-by-column conservation scoring method, and is therefore more similar to phyloP (with --wig-scores or --base-by-base) than to phastCons, which attempts to model multibase elements. In simulation experiments, phyloP and GERP seem to have very similar power to distinguish neutral and conserved bases, but phyloP does have some additional features that may be of interest. For example, it supports the detection of lineage-specific conservation or acceleration, it can use several different substitution models, and it can be used to score features of length >1 base. PhastCons and DLESS take an alternative approach to the detection of (potentially lineage-specific) conservation — instead of scoring individual bases, they allow information to be pooled across sites and attempt to identify conserved elements using hidden Markov models. These methods may be of interest when the focus is on elements, rather than individual bases, or when the species set is small or branch lengths are short, so that there is relatively little information at individual sites.

In the area of comparative gene finding (another branch of functional element identification), the exoniphy program is similarity to the Shadower and EvoGene programs, which came out at about the same time. Despite their theoretical appeal, these programs (including exoniphy) have never performed quite as well as pairwise gene predictors such as TWINSCAN, or multispecies extensions such as N-SCAN. More recent discriminative approaches such as CONTRAST seem to perform even better. At this stage, we recommend using these programs instead of exoniphy for comparative gene prediction.