phyloP Tutorial

Usage and options for phyloP

Required input files

● An alignment file in one of the following formats - MAF, FASTA, PHYLIP, MPM, SS

● A phylogenetic model produced by phyloFit in .mod format.

The program can be run with a command of the form:

phyloP [OPTIONS] neutralmodel.mod [alignment] > out

Here we present commonly used options for running phyloFit.

Option	Description
--method	The method used to compute p-values or conservation/acceleration scores. Methods available - SPH, LRT, SCORE, GERP The default method is SPH. LRT (likelihood ratio test) and SCORE(score test) compare an alternative model having a free scale parameter withing the given neutral model. The GERP-like method (GERP) estimates the number of "rejected substitutions" per base by comparing the (per-site) maximum likelihood expected number of substitutions with the expected number under the neutral model. LRT, SCORE and GERP can be used only with --base-by-base, --wig-scores or --features.
--mode	The mode used to compute p-values. Can be used with --base-by-base, --wig-scores or --features. Modes available - CON, ACC, NNEUT, CONACC CON(default) computes one-sided p-values so that small p (large -log p) indicate unexpected conservation or acceleration(ACC) NNEUT - two-sided p-values such that small p indicates an unexpected departure from neutrality. CONACC uses positive values (p-values or scores) to indicate conservation and negative values to indicate acceleration.
--wig-scores	Compute seperate p-values per site, and then compute site-specific conservation (acceleration) scores as -log(p). Output base-by-base scores in fixed-step wig format, using the coordinate system of the reference sequence
--features	Read features from (GFF or BED format) and output a table of p-values and related statistics with one row per feature. The features are assumed to use the coordinate frame of the first sequence in the alignment.
--subtree	Partition the tree into the subtree beneath the node whose name is given and the complementary supertree, and consider conservation/acceleration in the subtree given the supertree. The branch above the specified node is included with the subtree.
--branch	Like subtree, but partitions the tree into the set of named branches (each named by its child node), and all the remaining branches. Then tests for conservation/acceleration in the set of named branches relative to the others.

Examples

Example 1 - Run phyloP with an alignment and mod file using the CONACC mode and LRT method.

Using the likelihood ratio test (LRT) method one can compute conservation scores for each site in the alignment and output them in the fixed-step wig format. These score can summarize conservation and acceleration using the CONACC mode.

phyloP --mode CONACC --method LRT --wig-scores neutralmodel.mod alignment.maf> phyloPscores.wig

Input files:

Alignment file (MAF)

Model file (.mod)

Output files:

Output wig file (.wig)

In the output wig file, the absolute values of the scores represent -log p-values under a null hypothesis of neutral evolution.The sites predicted to be conserved are assigned positive scores while the sites predicted to be accelerated are assigned negative scores.

Example 2 - Run phyloP with the features option.

If there are a subset of sites in your alignment that you need to use to estimated the model, such as ancestral repeats or intergenic regions, a features file can be provided using the --features option.

phyloP --method LRT --mode CONACC --features features.bed model.mod alignment.fa > features.out

Input files:

Alignment file (FASTA)

Features filed (bed)

Model file (.mod)

Output files:

Features Output Table file

Features Output GFF file

The output file is a table of p-values and related statistics with one row per feature. The features are assumed to use the coordinate frame of the first sequence of the alignment.

The option -g or --gff-scores can be used to output a GFF, instead of a table, assigning each feature a score equal to its -log p-value.

phyloP --method LRT --mode CONACC --features features.bed -g model.mod alignment.fa > features.gff

Example 3 - Run phyloP with the subtree and branch options.

The --subtree option can be used to partition the tree into the subtree beneath the node provided and the complementary supertree, and consider conservation/acceleration in the subtree given the supertree. The branch above the specified node is included with the subtree.

First, we need to make sure that names are assigned to all ancestral nodes. tree_doctor which is a PHAST utility can be used to that. If a node is unnamed, a name is created by concatenating the names of a leaf from its left subtree and a leaf from its right subtree.

tree_doctor --name-ancestors neutralmodel.mod > named_model.mod

Scores describing lineage specific conservation can then be computed using the --subtree option. Here we use the --base-by-base option which outputs multiple values per site, in a method-dependent way.

phyloP --method LRT --subtree mm9-rn4 --mode CONACC --base-by-base named_model.mod alignment.maf > subtree_basebybase

Similarly a features file can be provide along with the subtree option to get a table of p-values and related statistics with one row per feature.

phyloP --method LRT --subtree mm9-rn4 --mode CONACC --features features.bed named_model.mod alignment.maf > subtree_features

The --branch option is similar to --subtree, but it partitions the tree into the set of named branches, and all the remaining branches before testing for conservation/acceleration in the set of names branches relative to the others. A comma-delimited list of child nodes can be provided as an argument.

phyloP --method LRT --branch mm9-rn4 --mode CONACC -w named_model.mod alignment.maf > branch.wig

Input files:

Output files:

Subtree basebybase Output file

Subtree features Output file

Branch Output wig file

The output file is a table of p-values and related statistics with one row per feature. The features are assumed to use the coordinate frame of the first sequence of the alignment.

The option -g or --gff-scores can be used to output a GFF, instead of a table, assigning each feature a score equal to its -log p-value.

phyloP Tutorial

A command-line program within PHAST that computes conservation or acceleration p-values based on an alignment and a model of neutral evolution using independent hypothesis tests (-log p-values) at individual nucleotides.

Download and compile PHAST

Generate neutral model file (phyloFit)

Usage and options for phyloP

Examples

Example 1 - Run phyloP with an alignment and mod file using the CONACC mode and LRT method.

Example 2 - Run phyloP with the features option.

Example 3 - Run phyloP with the subtree and branch options.