Professional Documents
Culture Documents
0 Manual
(c) F. Ronquist 2001
Tree fitting has important applications in historical biogeography, coevolution and gene tree-species tree
fitting [see recent reviews by \Page, 1998 #1419; Ronquist, 1998 #760]. A general characteristic of these
problems is that two different kinds of trees are fitted to each other in order to infer how the lineages they
represent have been associated with each other during their evolution. In historical biogeography, the two
different kinds of trees are organism phylogenies and area cladograms, in coevolution they are parasite and host
phylogenies, and in gene tree-species tree problems they are gene trees and species phylogenies.
I and others have argued that parsimony methods for tree fitting should be based on models recognising
different types of events and associating each of these events with a cost inversely related to the likelihood of the
event [Ronquist, 1990 #552; Page, 1995 #551; Charleston, 1998 #]. Because many workers do not believe in
deriving parsimony methods from models with events, I have called this approach event-based parsimony. In
my view, however, event-based parsimony really represents the only logically defensible way in which
parsimony inference can be applied to the problem of tree fitting.
A number of different models can be used in parsimony-based tree fitting but most of the work has thus
far been concentrated on the four-event model first introduced by Page [, 1995 #551; Ronquist, 1998 #739] or
subsets and variants thereof [Ronquist, 1990 #552; Goodman, 1979 #751; Ronquist, 1996 #547; Ronquist, in
press #1591] In this model, we recognize four different types of events: codivergence events, duplication events,
sorting events and switching events. Codivergence events correspond to geographical vicariance in historical
biogeography, simultaneous host and parasite speciation in coevolving species associations, and gene tree
divergence caused by speciation in gene tree-species tree studies. Duplication events correspond to sympatric or
allopatric speciation in response to a temporary barrier in biogeography, independent parasite speciation in
coevolution, and gene duplication in gene tree analysis. Sorting events correspond to (partial) extinction in
biogeography, and lineage sorting in coevolution and gene tree analysis. Finally, switches correspond to
dispersal between isolated areas in biogeography, host shifts in coevolving associations, and horizontal gene
transfer in gene tree analysis.
TreeFitter is a simple program for parsimony-based tree fitting. It can handle arbitrary cost assignments
fulfilling the requirements that duplication events, sorting events, and switches all have zero or positive cost
associated with them. Codivergence events can be associated with either positive or negative cost (or zero cost).
In TreeFitter terminology, one kind of trees is called P-trees, the other kind H-trees, referring to the analogy with
parasite and host trees in coevolutionary analysis. In historical biogeography, the H-trees are the area cladograms
and the P-trees the organism phylogenies. In coevolution, the interpretation is self-evident. In gene tree-species
tree fitting, the P-trees are the gene trees and the H-trees are the species trees.
TreeFitter has a limited number of commands but still allows a number of useful inferences to be drawn
from the data sets. It fits any number of P-trees to a given H-tree, and it can search for the best H-tree given a set
of P-trees. It can calculate the events implied by the minimum-cost solutions and reconstructions can be saved in
TreeMap format (not yet implemented). Inferences about historical constraints or the number of events of a
particular type can be tested against inferences drawn from random data sets. These random data sets are drawn
from the original data either by random permutation of the terminals in the P-tree, the H-tree, or both.
Alternatively, either the P-trees, the H-tree or both are replaced by trees drawn at random from a tree universe
generated by the Markov process (all labelled histories equally probable) or from the tree universe where all
labelled distinct cladograms are equally probable. Finally, TreeFitter can examine portions of parameter space to
find the combination of cost assignments giving the best chances of finding historically constrained patterns,
given a set of P-trees and an H-tree. By default, TreeFitter works with the following cost assignments:
codivergence and duplication events have zero cost, sorting events have a cost of 1.0, and switches a cost of 2.0.
This combination of cost assignments works well for a wide variety of problems but not for all cases where it is
possible to retrieve phylogenetically conserved association patterns [Ronquist, in press #1591].
The commands available in TreeFitter are summarized below. Please remember that this software has
been developed mainly for my own research needs and is not being maintained as a commercial software
package. It is provided for free on the understanding that there are no guarantees that the software will not crash
your system, destroy your files or fail to perform as you expect. Always keep backup copies of your files. Any
suggestions for improvements or detailed bug reports are welcome and should be addressed to me
(fredrik.ronquist@ebc.uu.se).
TreeFitter commands
The TreeFitter commands are described with a syntax similar to that used in the PAUP manual. A line fed to
TreeFitter should contain a command, followed by some options with corresponding settings. In describing the
syntax, items that are optional are given within square brackets [ ]. The settings can either be a floating point
value (specified by floatval), an integer value (specified by intval, or any of a set of alternative keyword settings
(given within curly brackets and separated by vertical lines, as in {setting1|setting2|setting3}. The commands can
either be typed in from the keyboard or entered in a batch file. The batch file can then be processed by using the
execute command. The format of the batch files is similar to the NEXUS format with different blocks of
commands. TreeFitter commands can also be issued outside blocks. TreeFitter is case-insensitive except for the
labels of the H-tree and P-tree terminals.
Note that the range statement can be divided into several lines. TreeFitter uses the semicolon to find the
end of a statement in a datafile.
htrees
htrees
htrees
htrees
htrees
set <options>;
This command is used to change the settings of a number of different parameters, as follows.
algorithm = {LB | UB}
This option determines whether TreeFitter will be using a lower-bound or an upper-bound
algorithm to fit H-trees and P-trees. The lower-bound algorithm is recommended for general usage but
can occasionally give reconstructions with incompatible switches [Ronquist, 1996 #547; Ronquist, in
press #1591]. The upper-bound algorithm is slower but gives exact solutions without incompatible
switches for ordered H-trees. Unless you have many P-terminals per H-terminal and many switches, there
is not likely to be much information in the P-trees about the order of the nodes in the H-tree, and all or
most of the orderings of the nodes of the H-tree will have the same cost and imply the same set of events.
Therefore, if you use the upper-bound algorithm in H-tree searches you are likely to obtain a large set of
equally optimal H-trees that are identical in topology but differ only in the order of the splitting events
(nodes).
To check whether you have problems with incompatible switches, you can compare the lengths of
the H-trees fitted with the upper-bound algorithm to the length of the same trees fitted with the lowerbound algorithm.
estimate <options>;
This command explores different cost event assignments and their effects on the possibilities of finding
phylogenetically conserved association patterns. The p values obtained with different cost-event assignments are
reported. It is then up to the user to evaluate the results and to set the cost-event assignments accordingly. (The
parameter space tested is currently hard-coded).
cmin = <floatval>
Determines the minimum codivergence cost. Default setting is 0.0.
cmax = <floatval>
Determines the maximum codivergence cost. Default setting is 0.0.
cstep = <floatval>
Determines the interval between successive codivergence costs tried. Default setting is 0.2.
umin = <floatval>
Determines the minimum duplication cost. Default setting is 0.0.
umax = <floatval>
Determines the maximum duplication cost. Default setting is 0.0.
ustep = <floatval>
Determines the interval between successive duplication costs tried. Default setting is 0.5.
smin = <floatval>
Determines the minimum sorting cost. Default setting is 1.0.
smax = <floatval>
Determines the maximum sorting cost. Default setting is 1.0.
sstep = <floatval>
Determines the interval between successive sorting costs tried. Default setting is 0.5.
imin = <floatval>
Determines the minimum switching cost. Default setting is 0.0.
imax = <floatval>
Determines the maximum switching cost. Default setting is 10.0.
istep = <floatval>
Determines the interval between successive switching costs tried. Default setting is 0.5.
fit <options>;
This command will fit the selected H-trees onto the selected P-trees using the currently chosen event-cost
assignments (altered with the set command). Available options:
output = {SUMMARY | STANDARD | DETAILED}
The setting of this option determines the type of report produced by the fit command. If
SUMMARY is chosen, only the cost (and p value, if relevant) is printed for each H-tree. If STANDARD
is chosen, then a more detailed report is printed for each H-tree. If DETAILED is chosen, results are
printed separately for each P-tree.
perm = {HTERM | PTERM | HPTERM | HTREE | PTREE | HPTREE}
The setting of this option determines the type of permutation used to test the significance of
results. If HTERMS is chosen, H-tree terminals are permuted; if PTERMS is chosen, P-tree terminals are
permuted instead; and if HPTERMS is selected, both H-tree and P-tree terminals are permuted. If HTREE
is chosen, then a random H-tree is drawn for each permutation; if PTREE is chosen, then a random P-tree
is drawn instead. Finally, if HPTREE is chosen, both the H-tree and the P-tree is replaced by random
trees. The tree universe used for the random trees is set by the treespace option.
nperm = <intval>
Sets the number of permutations used in permutation tests of the fit. If 0 is chosen, no
permutations will be performed.
calcevents = {YES | NO }
Determines whether the program will calculate the frequency of different types of events
(switches, duplications, sortings and switches) when fitting H-trees and P-trees. The reported frequency is
the range (minimum and maximum) over the equally optimal reconstructions.
showancstates = {YES | NO }
Determines whether the ancestral states (the ancestral hosts) are output for each P-tree. Ignored
unless output = DETAILED. (not yet available).
showreconstructions = {YES | NO }
Determines whether the optimal reconstructions are output for each P-tree. Ignored unless output =
DETAILED. (not yet available).
search <options>;
Searches for the best H-tree given the selected P-trees. Available options:
type = {EXHAUSTIVE | HEURISTIC}
Determines whether an exhaustive or a heuristic search will be used.
order <options>;
Determines the order of the nodes in the currently selected H-trees. (not yet available).
keep = {ONE | MIN | BOUND}
Determines whether the search should keep only one tree of minimum cost for each starting tree,
all trees of minimum cost, or all trees with a maximum cost set by the bound option.
bound = <floatval>
Determines the maximum cost of the H-trees to be kept.
Determines whether an exhaustive or a heuristic search will be used.
log <file-name>;
This command is used to log the results to a file with the specified name. The file will be stored in the
same directory as the TreeFitter program.
execute <file-name>;
This command will execute the file with the specified name. The file must be in the same directory as the
TreeFitter program, unless the correct path is given as part of the file name.