PROGRAM: dmcondition USAGE: dmcondition [OPTIONS] > out.dat NOTE: One or both of the --cond-on-subs or cond-on-species options are required! See below for details... DESCRIPTION: Produce a plain-text file describing states and positions in the emissions matrix that need to be zeroed out in order to condition on site presence in a given species and/or presence of substitutions in order to make a gain or loss call. This can be supplied to dmsample within the alignments.lst file. See dmsample -h for details. EXAMPLES: OPTIONS: --msa-format, -i FASTA|PHYLIP|MPM|MAF|SS Alignment format (default FASTA). Note that the program msa_view can be used for conversion. --refseq, -M (for use with --msa-format MAF) Read the complete text of the reference sequence from (FASTA format) and combine it with the contents of the MAF file to produce a complete, ordered representation of the alignment. The reference sequence of the MAF file is assumed to be the one that appears first in each block. --cond-on-species, -x Condition gain and loss calls in the hmm on presence of a site in a given species. This is useful if ChIP-based regions are used as inputs for motif finding. The effect is to prohibit motif loss predictions on branches leading to a species believed to contain a binding site based on ChIP (or other) evidence and prohibit gain predictions on branches leading away from that species. --cond-on-subs, -X Condition gain and loss predictions on presence of at least one substitution within a window believed to represent a binding site. This prevents gain/loss predictions on branches that contain no observable substitutions in the multiple alignment. The underlying algorithm uses Fitch parsimony to partition the tree into branches that do and do not contain substitutions for each motif window in the dataset and zeroes out the chain of gain and loss states in the emissions matrix corresponding to combinations of branch and motif position that do not contain any substitutions to support such a prediction. If used in conjunction with --cond-on-species, states that are already zeroed out because they are incompatible with site presence in a given species will be skipped in this step. --nosubs, -n Toggle between modes for deciding whether to zero a branch given the character sets at parent and child node. The default mode is to zero branches if parent and child nodes have identical sets of characters, regardless of size. This option toggles the alternate mode, where branches are zeroed only if there are no scenarios that allow any possibility of substitution along a branch. That is, parent and child both have the same base sequence with only a single possibility at both nodes. --as-bed, -b Output the locations of individual zeroed states in a bed format. This expects the filename of the sequence file to be in the format "chrom.genomic-start.genomic-end.format" -- other formats will cause unpredictable behavior (read: segfaults). --refidx, -r Use coordinate frame of specified sequence in output. Default value is 1, first sequence in alignment; 0 indicates coordinate frame of entire multiple alignment. --help, -h Show this help message and exit.