To run phyloP users must download the PHAST binaries or compile PHAST from source.
PHAST binaries can be downloaded by clicking the appropriate Windows, MacOSX or Linux icon on the PHAST website.
PHAST source can be downloaded by clicking the Source icon on the PHAST website or Phast Github.
For complete instructions on how to compile PHAST from source, please visit Quick Start - Installing PHAST.
A phylogenetic model in .mod format is required by phyloP. This can be generated using phyloFit from the PHAST package.
For example, phyloFit can fit a phylogenetic model (neutralmodel.mod) given an alignment file and a tree topology
phyloFit --tree "((galGal2,((rn3,mm5),fr1)),hg17)" --subst-mod REV --out-root modelfile alignment.maf
Detailed instructions for using phyloFit and information on various options can be found in the phyloFit Tutorial
Required input files
● An alignment file in one of the following formats - MAF, FASTA, PHYLIP, MPM, SS
● A phylogenetic model produced by phyloFit in .mod format.
The program can be run with a command of the form:
phyloP [OPTIONS] neutralmodel.mod [alignment] > out
Here we present commonly used options for running phyloFit.
Option | Description |
---|---|
--method | The method used to compute p-values or conservation/acceleration scores. Methods available - SPH, LRT, SCORE, GERP The default method is SPH. LRT (likelihood ratio test) and SCORE(score test) compare an alternative model having a free scale parameter withing the given neutral model. The GERP-like method (GERP) estimates the number of "rejected substitutions" per base by comparing the (per-site) maximum likelihood expected number of substitutions with the expected number under the neutral model. LRT, SCORE and GERP can be used only with --base-by-base, --wig-scores or --features. |
--mode | The mode used to compute p-values. Can be used with --base-by-base, --wig-scores or --features. Modes available - CON, ACC, NNEUT, CONACC CON(default) computes one-sided p-values so that small p (large -log p) indicate unexpected conservation or acceleration(ACC) NNEUT - two-sided p-values such that small p indicates an unexpected departure from neutrality. CONACC uses positive values (p-values or scores) to indicate conservation and negative values to indicate acceleration. |
--wig-scores |
Compute seperate p-values per site, and then compute site-specific conservation (acceleration) scores as -log(p). Output base-by-base scores in fixed-step wig format, using the coordinate system of the reference sequence |
--features |
Read features from The features are assumed to use the coordinate frame of the first sequence in the alignment. |
--subtree |
Partition the tree into the subtree beneath the node whose name is given and the complementary supertree, and consider conservation/acceleration in the subtree given the supertree. The branch above the specified node is included with the subtree. |
--branch |
Like subtree, but partitions the tree into the set of named branches (each named by its child node), and all the remaining branches. Then tests for conservation/acceleration in the set of named branches relative to the others. |
Using the likelihood ratio test (LRT) method one can compute conservation scores for each site in the alignment and output them in the fixed-step wig format. These score can summarize conservation and acceleration using the CONACC mode.
phyloP --mode CONACC --method LRT --wig-scores neutralmodel.mod alignment.maf> phyloPscores.wig
Input files:
Output files:
In the output wig file, the absolute values of the scores represent -log p-values under a null hypothesis of neutral evolution.The sites predicted to be conserved are assigned positive scores while the sites predicted to be accelerated are assigned negative scores.
If there are a subset of sites in your alignment that you need to use to estimated the model, such as ancestral repeats or intergenic regions, a features file can be provided using the --features option.
phyloP --method LRT --mode CONACC --features features.bed model.mod alignment.fa > features.out
Input files:
Output files:
The output file is a table of p-values and related statistics with one row per feature. The features are assumed to use the coordinate frame of the first sequence of the alignment.
The option -g or --gff-scores can be used to output a GFF, instead of a table, assigning each feature a score equal to its -log p-value.
phyloP --method LRT --mode CONACC --features features.bed -g model.mod alignment.fa > features.gff
The --subtree option can be used to partition the tree into the subtree beneath the node provided and the complementary supertree, and consider conservation/acceleration in the subtree given the supertree. The branch above the specified node is included with the subtree.
First, we need to make sure that names are assigned to all ancestral nodes. tree_doctor which is a PHAST utility can be used to that. If a node is unnamed, a name is created by concatenating the names of a leaf from its left subtree and a leaf from its right subtree.
tree_doctor --name-ancestors neutralmodel.mod > named_model.mod
Scores describing lineage specific conservation can then be computed using the --subtree option. Here we use the --base-by-base option which outputs multiple values per site, in a method-dependent way.
phyloP --method LRT --subtree mm9-rn4 --mode CONACC --base-by-base named_model.mod alignment.maf > subtree_basebybase
Similarly a features file can be provide along with the subtree option to get a table of p-values and related statistics with one row per feature.
phyloP --method LRT --subtree mm9-rn4 --mode CONACC --features features.bed named_model.mod alignment.maf > subtree_features
The --branch option is similar to --subtree, but it partitions the tree into the set of named branches, and all the remaining branches before testing for conservation/acceleration in the set of names branches relative to the others. A comma-delimited list of child nodes can be provided as an argument.
phyloP --method LRT --branch mm9-rn4 --mode CONACC -w named_model.mod alignment.maf > branch.wig
Input files:
Output files:
Subtree basebybase Output file
The output file is a table of p-values and related statistics with one row per feature. The features are assumed to use the coordinate frame of the first sequence of the alignment.
The option -g or --gff-scores can be used to output a GFF, instead of a table, assigning each feature a score equal to its -log p-value.