PROGRAM: phastBias
USAGE: phastBias [OPTIONS] alignment neutral.mod foreground_branch > scores.wig
The alignment file can be in any of several file formats (see
--msa-format). The neutral model must be in the .mod format
produced by the phyloFit program. The foreground_branch should
identify a branch of the tree (internal branches can be named
with tree_doctor --name-ancestors).
DESCRIPTION:
Identify regions of the alignment which are affected by gBGC,
indicated by a cluster of weak-to-strong (A/T -> G/C) substitutions
amidst a deficit of strong-to-weak substitutions on a particular
branch of the tree. The regions are identified by a phylo-HMM
with four states: neutral, conserved, neutral with gBGC, and
conserved with gBGC.
OUTPUT:
phastBias produces a wig file with scores for every position in the
alignment indicating the probability of being in one of the gBGC
states. It can also produce gBGC tracts by thresholding this
probability at 0.5, or a matrix of probabilities for all four states.
See OUTPUT OPTIONS below.
GENERAL OPTIONS:
--help,-h
Print this help message.
TUNING PARAMETER OPTIONS:
gBGC PARAMETERS:
--bgc **
The B parameter describes the strength of gBGC. It must be > 0.
Too low of a value may yield false positives, as the gBGC model
becomes indistinguishable from the non-gBGC model.
Default: 3
--estimate-bgc <0|1>
Use "--estimate-bgc 1" to estimate B by maximum likelihood.
Default: 0
--bgc-exp-length
Set the prior expected length of gBGC tracts. This is equivalent to
1/alpha in the parametrization defined by Capra et al, where
alpha is the rate out of gBGC states.
Default: 1000
--estimate-bgc-exp-length <0|1>
Use "--estimate-bgc-exp-length 1" to estimate this parameter by an
expectation-maximization algorithm.
Default: 0
--bgc-target-coverage
Set the prior for gBGC tract coverage (as a fraction between 0 and 1).
This is represented in the model as beta/(alpha+beta), where beta
is the rate into the gBGC state, and alpha is the rate out of the
gBGC state.
Default: 0.01
--estimate-bgc-target-coverage <0|1>
Use "--estimate-bgc-target-coverage 0" to hold this parameter constant.
Default: 1 (This is the only parameter estimated by default.)
CONSERVATION PARAMETERS:
Note: it is not recommended to tune these parameters with phastBias.
Rather, phastCons may be used to determine the best values for rho
and the transition rates into/out of conserved elements. See
phastCons --help and the phastCons HOWTO (available online) to learn
about tuning these parameters.
--rho
Set the scaling factor for branch lengths in conserved states. Rho should
be between 0 and 1.
Default: 0.31
--cons-exp-length
Set the prior expected length of conserved elements. This parameter is
held constant; if you want to tune it, it is recommended to do this
with the phastCons program under a non-gBGC model (see the
--expected-length option in phastCons).
Default: 45
--cons-target-coverage
Set the prior for coverage of conserved elements (as a fraction
between 0 and 1). Like the --cons-exp-length above, this parameter
is also held constant, but can be tuned with phastCons (see
phastCons --transitions).
Default: 0.3
OTHER PARAMETERS:
--scale
Set an overall scaling factor for the branch lengths in all states.
Default: 1
--estimate-scale <0|1>
Rescale the branches in all states by a scaling factor determined by
maximum likelihood (initialized by --scale above).
Default: 0
--eqfreqs-from-msa <0|1>
Reset equilibrium frequencies of A,C,G,T based on frequencies observed
in the alignment. Otherwise will not be altered from input model.
Default: 1
OUTPUT OPTIONS:
--output-tracts
Print a GFF file identifying all regions with posterior probability of
being in a gBGC state > 0.5.
--posteriors
Use this option to control posterior probability output, which is
written to stdout. "none" implies do not output anything; wig outputs
a standard fixed-step wiggle file giving the probability that each
base is assigned to a gBGC state; "full" outputs a table with five
columns. The first column is the coordinate (1-based relative to
the first sequence in the alignment), followed by the probabilities
of each of the four states: neutral, conserved, gBGC neutral,
gBGC conserved.
Default: wig
--output-mods
Print out the tree models for all four states to .cons.mod,
.neutral.mod, .gBGC_cons.mod, and
.gBGC_neutral.mod.
--informative-fn,-i
Print a GFF containing regions of the alignment which are informative
for gBGC. Note: only works properly if foreground branch is a single
branch (not a group of branches).
--informative-only,-o
(To be used with --informative-fn). Print the informative regions, then
quit.
REFERENCES:
Capra JA, Hubisz MJ, Kostka D, Pollard KS, Siepel A: A Model-Based Analysis
of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes.
(Manuscript in submission).
**