You are on page 1of 31

N ote regarding 2007 republication: Source and binary code referenced in note 34 are no longer at that

URL. The same assets are now hosted at http://www.selfmummy.com/mss2dna

Ad am Breind el
Departm ent of Classics, Brow n University
May 1998

The Application of a D iscrete-Character Parsimony


Phylogeny-Inference Algorithm to Classical Text Stemmata

The purpose of this paper is to present tw o interd isciplinary observations; a new

technique for stem m atic analysis; and prelim inary results from an application of this

technique.

The first interd isciplinary observation is that the m ethod s and purpose of

stem m atics overlaps substantially w ith the m ethod s and purpose of the biological sub-

d iscipline of clad istic analysis. While this fact is rarely em phasized or exploited , it is not

a new d iscovery, and its history w ill be d iscu ssed . The second interd isciplinary

observation is that com puter softw are w hich has been d eveloped for biolog ists in ord er

to solve problem s in clad istic analysis now offers us the possibility of ad vances in the

construction of textual stem m ata, through a non -trad itional use of trad itional

m anuscript collations.

The analytic technique contained herein – w hich d oes not appear ever to have

been attem pted heretofore – is the application of an existing clad istic analysis softw are

package to the stem m atic analysis of a m anuscript collation. The use of this technique to

analyze part of the Sallu stian corpus is thorough ly d ocum ented in this stud y.

Prelim inary results ind icate that the technique prod uces a stem m a nearly id entical to
Breind el 2

that published by L.D. Reynold s, the ed itor of the Oxford text. H ence, this m ethod

appears to offer an effective new approach to evaluating the relationships am ong extant

versions of a text.

The interplay betw een the d isciplines of biological system atics, genetics, and

textual criticism , w hich m akes this paper possible, has a som ew hat Byzantine history

spanning the last thirty years. I ask the read er to consid er w ith charity m y exposition of

this history. For it seem s that the relative uniqueness of this paper d em and s an

unusually large am ount of background inform ation. 1

Background

In 1968, John G. Griffith published a paper entitled “A Taxonom ic Stud y of the

Manuscript Trad ition of Juvenal.” 2 In this stu d y, Griffith applied m ethod s of num erical

taxonom y to the classification of Juvenal m anuscripts. The taxonom ic m ethod s, as he

explains in a sim ilar article the follow ing year, 3 he had in turn learned from biologist

Robert Sokal‟s 1966 Scientific A merican article on that topic.

Griffith d escribes the biological ad vances w hich he exploits in analyzing the texts

of Juvenal:

1
I have fou nd it necessary in the cou rse of this p ap er to refer to som e technical asp ects of system atics and
genetics. I have attem p ted to restrict to an elem entary level the fam iliarity requ ired w ith these d iscip lines,
in ord er to m ake this w ork accessible to a broad au d ience. N onetheless, read ers seeking an introd u ctory
exp osition m ay find ap p rop riate sections of the follow ing textbooks u sefu l:
Gam blin, Lind a and Gail Vines, ed s. (1991) The Evolution of Life, Oxford , chap ter 3.
Maxson, Lind a R. and Charles H . Dau gherty (1992) Genetics: A Human Perspective,
Du bu qu e, Iow a, chap ters 8, 10.
Minkoff, Eli C. (1983) Evolutionary Biology, Read ing, Mass., chap ter 22.
Griffith, John G. (1968) “A Taxonom ic Stu d y of the Manu scrip t Trad ition of Ju venal” M useum Helveticum
2

25:101-38.
Griffith, John G. (1969) “N u m erical Taxonom y and Som e Prim ary Manu scrip ts of the Gosp els,” Journal of
3

Theological Studies 20:389-406.


Breind el 3

Scientist have long been aw are of the lim itations of the trad itional
m ethod s of classifying specim ens; biologists in particular have laboured
und er this hand icap. Within the last 10-15 years consid erable ad vances
have been m ad e, largely because techniques d eveloped for com puter use
have enabled specialists in this activity, w ho style them selves num erical
taxonom ists, to sift w ith speed and precision large m asses of
unprom isingly heterogeneous m aterial, and thereby to isolate groups or
„taxa‟ of related specim ens, on the basis of w hich further inquiry m ay be
cond ucted . ... 4

Thus, Griffith id entifies a requirem ent w hich textual criticism has in com m on w ith

taxonom y: in both d isciplines, objects m u st be grouped based on sm all num bers of

d istinctions am ong vast am ounts of sim ilarity. The num eric taxonom y m ethod s appear,

he says, to offer new quantitative approaches applicable to both problem s. H e then

expresses the hope that w e m ight find associations betw een specim ens by evaluating

large am ounts of d ata w ith m achine assistance. In light of the existing resources,

though, he rem arks that “for a textual critic operating w ith only a few thousand lines of

text it is sim ply not w orth the trouble of program m ing the d ata for m achine-

processing...” 5

The lim itations to Griffith‟s pioneering approach w ere unfortunately several. H is

proced ure w as, first, extraord inarily laborious: for the fourteen Gospel m anuscripts

analyzed in his article of 1969, up to fifty-six m anual record ing acts w ere required for

every variant am ong one or m ore of the m anuscripts. Thus he w as constrained to loo k

at only sm all sam ples of the d ata. Moreover, if he had had access to m ore d ata, he m ay

likely have lacked access to the technology to evaluate it.

4
Griffith, op . cit. 1968, p p . 113-14.
5
ibid .
Breind el 4

Griffith‟s proced ure (and , in all fairness, the biological m ethod s w ith he w orked )

had a m ore troublesom e lim itation in that they resulted only in associations of objects.

Griffith could assert the d istribution of manu scripts into various sub -groups w ith

statistically-argued accuracy, but the m ere grouping of the m anuscripts d oes not seem

to have accom plished m uch. H is m ethod s said nothing about the genealogical

relationships of the m anuscripts. For exam ple, if manu scripts A, B, and C are found to

be in a single taxon, w e have only form alized their external sim ilarity. As useful as su ch

form alization m ight be, little is ind icated about the genealogical relationships likely to

inhere betw een the m anuscripts.

Thus, Griffith su cceed ed in bringing num erical taxonom y into the arena of

textual criticism , but the biological approach upon w hich he d epend ed w as not

am bitious enough to d escribe the relationships am ong the specim ens and so his textual

techniques appear to have fallen into d esuetud e.

In 1973, Martin West published a short w ork on textual criticism , Textual

Criticism and Editorial Technique.6 In this w ork, West explains that com puters m ight

theoretically hold som e prom ise for stem m a construction, because, und er the best

possible circum stances, build ing a stem m a d em and s only sim p le logic. Such a stem m a

w ould naturally be an ad vance over Griffith‟s taxonom ic m an uscript associations. West

is, how ever, skeptical about the id ea and hold s out som e theoretical reservations:

If provid ed w ith suitable prepared transcriptions of the m anuscripts,


purged of coincid ental errors, a com puter could d raw up a clum sy and
unselective critical app aratus; and it could in principle – w here there w as
no contam ination! – w ork out an „unoriented‟ stem m a. That m eans ... that
it could w ork out a schem e sim ply by com paring the variants, w ithout
6
West, Martin L. (1973) Textual Criticism and Editorial Technique, Stu ttgart.
Breind el 5

regard to w hether they w ere right or w rong; but this schem e w ould be
capable of su spension from any point [i.e., the schem e could not
d istinguish the subarchetypes] ... The correct orientation could only be
d eterm ined by evaluating the quality of the variants, w hich no m achine is
capable of d oing.7

West‟s objections w ill be consid ered in d etail later, as they are im portant to the

present investigation. But it is w orth noting for now that even if West had w anted to

test a com puterized construction of a stem m a, there w ould have been obstacles to his

p rogress.

First, there w ould not have been read ily available technology for his purpose.

But m ore im portantly, outsid e of theoretical com puter science or m athem atical graph

theory, there had not been practical research on autom ating the construction of

stem m ata when the data for the specimens is inconsistent or underdetermined. That is, if the

variants in a set of manuscripts w ere com pletely com patible w ith a unique stem m a, w e

w ould need only m ake the right inferences to generate it. In reality though, there is

usually no stem m a w hich is not inconsistent w ith at least one locus in the m anuscripts;

conversely, if a d egree of latitud e is allow ed so as to overcom e such strict

inconsistencies, w e find a m ultitud e of possible stem m ata. These stem m ata w e m ust

d istinguish on the basis of som e criterion capable of evaluating the likelihood that each

w ould give rise to the m anuscripts as they exist.

Thus, a variety of d ifficult problem s, theoretical and com p utational, inhere in the

task of mechanically constructing a stem m a – and they are not problem s w hich

classicists w ere likely to attack on their ow n. Fortuitously, how ever, d evelopm ent had

sim ultaneously been taking place w ithin the biological d isciplines of taxonom y and
7
West, op . cit., p p . 71-2.
Breind el 6

system atics so as to m otivate biologists to attem pt these sam e problem s. For d erivation

of the evolutionary relationships of a group of extant specim ens w as a key part of the

em ergent stud y now called clad istics.

Biologist Willi H ennig had begun to d evelop and ad vocate a strictly phylogenetic

approach to arranging organism s.8 H ennig‟s view , that the evolutionary relationships of

organism s form ed the best found ation for classifying and system atizing them , w as and

rem ains the object of d ebate.9 Parts of his theory how ever, seem to have been ad opted

or ad apted by increasing num bers of system atists throughout the 1970s.

The clad istic approach seem s intuitively obvious, and G.D.C. Griffiths (along

w ith m any d efend ers) insisted that it alone had the ad vantage of relying on objective

fact about the organism s in question (rather than d eploying the organism s into classes

invented by hum ans). Griffiths w rites, “[H ennig‟s m ethod ] provid es the only

theoretically sound basis for achieving an objective equivalence betw een the taxa

assigned to particular categories in a phylogenetic system .” 10 Unfortunately, w hat seem s

intuitively obvious can also be d eceptively fallacious, and clad istics d oes have a

d isingenuous sid e. It is w orth pointing out two objections to the system here, largely so

that the read er m ay see that th ey do not apply to a textual application of the theory.

8
H ennig m ight be called the father of m od ern clad istics; his w ork w as d evelop ed and d ebated in variou s
p u blications inclu d ing (1950) Grundzüge einer Theorie der Phylogenetischen Systematik, Berlin.
(1966) Phylogenetic Systematics, Urbana, Illinois.
(1971) “Zu r Situ ation d er biologischen System atik,” Erlanger Forschungen, R. Siew ing ed .,
Erlangen.
9
For view s on the early intellectu al p ositions in the d ebates, see Ernst Mayr (1976) Evolution and the
Diversity of Life: Selected Essays, Cam brid ge, Mass., p p . 435-41.
Griffiths, G.D.C. (1972) “The Phylogenetic Classification of Dip tera Cyclorrhap ha w ith Sp ecial
10

Reference to the Stru ctu re of the Male Postabd om en,” W. Ju nk, N .V., The H agu e.
Breind el 7

First, even if w e are granted a thorough know led ge of the evolutionary

interrelationships of the specim ens in question, no m ethod is thereby presented for

d eterm ining the level of d escent at w hich class d ivisions should be m ad e. We are only

show n that, having m ad e a choice, w e are bound to includ e and exclud e certain

specim ens.

Second , given three organism s, A, B, and C, suppose that A and B are sim ilar in

form , w hile C d iffers greatly from both A and B. Suppose further that A and C are

closer evolutionarily to one another than either is to B. In this situation – w hich is not

uncom m on in nature – w e w ould be forced u nd er H ennig‟s system either to class A, B

and C all together, or else to class A and C together against B (Figure 1).

Figure 1

N either of these options appeals to our intuition the w ay that the system at first d id . For

A and B appear to form a group as against C, and yet this is precisely the classification

w hich w e are prohibited from m aking.

These tw o objections, w hile having m uch practical im port for the classifying of

organism s, w ill clearly be irrelevant w hen w e com e to apply this m ethod to

m anuscripts. First, w e need n‟t classify m anuscripts by nam e (and if w e d o, w e accept

that classification as our ow n prod uction); second , w e have no sym pathy for sim ilarity
Breind el 8

of appearance betw een m anuscripts if w e have hard evid ence that they are unrelated in

origin (since it is the origin that is the object of the textual quest).

These clad istic m ethod s of analysis and classification, even if controversial,

prom pted research into the creation and evalu ation of stem m ata (or clad ogram s) from

incom plete and incom patible d ata. The clad istic approach d epend s for a starting point

on d eterm ining the evolutionary relationships of the specim ens – and these

relationships m ust be assem bled from lists of variations am ong the specim ens. H ence,

in a sense, biologists set to w ork on the problem s w hich had stood in front of Martin

West.

But d ebate about the philosophical und erpinnings of the clad istic m ethod ology

d id not subsid e. In 1977, the m ethod ology attracted a d efend er in University of

Michigan classicist and zoologist H . Don Cam eron, d ue to clad istics‟ evid ent sim ilarity

to established techniqu es in trad itional (i.e., not m echanical) textual criticism . 11 Cam eron

along w ith N orm an I. Platnick d escribe the d ebate, and situate them selves in it, thus:

Recent years have seen an increasing aw areness and u se am ong zoological


system atists of the theory and m ethod s of phylogenetic analysis
(clad istics) d eveloped by H ennig. These m ethod s have been w ell
d efend ed by [E.O.] Wiley from the point of view of Popperian
“hypothetico-d ed uctive” science. Critics, both of the m ethod s them selves
and of their application to classification, have not been silent... The
purpose of this paper is to point out a fact overlooked d uring the
controversy, nam ely, that m ethod s analogous to those of H ennig are
accepted as the stand ard tools of analysis in tw o other field s that resem ble
phylogenetic system atics in being prim arily concerned w ith constructing
and testing hypotheses about the interrelationships of taxa connected by
ancestor-d escend ant sequences.

Platnick, N orm an I. and H . Don Cam eron (1977) “Clad istic Method s in Textu al, Lingu istic, and
11

Phylogenetic Analysis,” Systematic Z oology 26:380-85.


Breind el 9

The field s referred to are ... textual criticism ... and ... linguistic
reconstruction.12

Cam eron and Platnick, w riting for an aud ience of biologists, next sum m arize the

techniques of textual criticism put forth by Paul Maas. 13 Differences of technique

betw een biological and textual stem m atics – w hich Cam eron and Platnick view as

subord inate to an overarching sim ilarity – are d escribed in m od erate d etail. 14 The paper

is intend ed to provid e a critique of a situation w ithin a d iscipline of biology, but it

serves also to ind icate that these scholars can recognize and m ake precise the

correspond ence betw een stem m a construction and clad istic analysis.

In a conference conclu d ed in 1983, Cam eron again presented his view of textual

criticism . The conference had been organized to investigate the biological and clad istic

m etaphor in other intellectual field s. 15 Cam eron treated stem m atics, but he d id not

d iscu ss stem m ata as a m etaphor from biology, since, as he points out, the stem m atic

m ethod s as used in both field s “w ere d eveloped by classical scholars system atically in

the nineteenth century and ... the origins of the m ethod can be found as early as the

sixteenth century...” 16 Beyond m erely recounting the techniques of Maas, Cam eron

explores the d istinction – as far as it im pacts his clad istics-stem m atics com parison –

betw een “vertical” or uncontam inated trad itions and “horizontal” transm issions, those

12
Platnick, op . cit., p . 380.
13
Maas, P. (1958) Textual Criticism, Oxford ; Platnick, op . cit., p . 381-3.
14
Platnick, op . cit., p . 384.
15
Biological Metap hor Ou tsid e Biology (1982) and Interd iscip linary Rou nd -Table on Clad istics and Other
Grap h Theoretical Rep resentations (1983) sym p osia at the University of Pennsylvania. Proceed ings in
H oenigsw ald , H enry M. an d Lind a F. Wiener, ed s. (1987) Biological M etaphor and Cladistic Classification,
Philad elp hia.
Cam eron, H .D. (1987) “The Up sid e-Dow n Clad ogram : Problem s in Manu scrip t Affiliation,” in
16

H oenigsw ald , op . cit.


Breind el 10

“full of Byzantine, and even ancient, ed iting and conjecture.” 17 In the latter cases,

“clad istic m ethod s give little aid .” But in the form er, he conclud es:

[V]ertical transm ission and uncontam inated text trad ition m ake the
m echanical application of clad istic m ethod s to reconstruct a single
archetype a w orkable and successful m ethod , w ith a claim to being
scientific...18

Thus, Cam eron argues that, at least in a vertical textual trad ition, w e ought to be able to

use m ethod s from clad istics to d erive a stemm a and even an archetype.

At this point, the next m ove for a textual critic m ight have appeared obvious:

m ate West‟s insight about m echanical prod uction of stem m ata w ith Cam eron‟s insight

that clad istics provid es the theoretical and algorithm ic und erpinning for West‟s

operation. That is, use clad istic techniques to attack thorny problem s of textual

transm ission. It is unclear w hy this approach w as not exploited in the 1980s. We m ight,

how ever, hypothesize a paucity of tools to support such research.

In the 1980s, three further d evelopm ents cam e about w hich m ad e the project

presented herein m ore practicable. 19 One breakthrough w as im proved DN A

sequencing:20 it becam e possible to put genetic m aterial from various species into an

autom ated process and receive, as output, essentially a collation show ing every genetic

d ifference betw een the sam ples. 21 More abund ant d ata w as now available w ith w hich

clad istic analysis could w ork.

17
Cam eron, op . cit., p . 238.
18
ibid .
19
It is im p ortant to note that none of these three d evelop m ents sp rang fu lly form ed from the head of Zeu s
in the 1980s. It is convenient to d escribe them here, as their conflu ence seem s to change the research
environm ent at the tim e, bu t research on DN A sequ encing, p arsim ony algorithm s, and of cou rse
com p u ters had a long p rior history.
20
In p aticu lar the d evelop m ent of p olym erase chain reaction (PCR) d u p lication of DN A segm ents.
21
That is, in the sequ enced strand s of DN A.
Breind el 11

The second d evelopm ent of this tim e period w as the availability of com puters

sophisticated enough to com pare and evaluate the thousand s or tens of thousand s of

possible clad ogram s (stem m ata) w hich m ight result from com paring large num bers of

species. That is, com p u ters allow ed biologists to overcom e that challenge w hich Maas

had id entified for textual critics, w hen he observed that a large num ber of specim ens or

w itnesses w ould prod uce an astronom ical num ber of possible stem m ata. 22

The last pre-requisite d evelopm ent w as softw are system s to put large quantities

of d ata (w hether from DN A or elsew here) together w ith the com puters. Softw are to

com pute likely stem m ata involves, at its core, algorithm s w hich have been topics in

com puter science and m athem atics for a half-century or m ore. H ence, strictly speaking,

appropriate softw are had probably been “in d evelopm ent” in research universities and

corporate labs for som e tim e. But the early 1980s saw the release of packages d esigned

specifically for clad istics, tailored to the needs of practicing biologists, and read y to run

on existing m icrocom puters.

The present experim ental stud y, d escribed below , is an attem pt to establish a

stem m a for the textual trad ition of Sallust‟s De Coniuratione Catalinae using one such

softw are package, the freely-d istributable Phylogeny Inference Package (or, as henceforth,

PH YLIP).23

Maas, op . cit., p . 47: “If w e have fou r w it nesses, the nu m ber of p ossible typ es of stem m a am ou nts to 250,
22

if w e have five, to ap p roxim ately 4,000, and so on in qu asi-geom etrical p rogression.”


23
Felsenstein, J. (1993) PHY LIP (Phylogeny Inference Package) version 3.5c, d istribu ted by the au thor, Dep t.
of Genetics, Univ. of Washington, Seattle. See http :/ / evolu tion.genetics.w ashington.ed u
Breind el 12

Before proceed ing to describe the m ethod and outcom e of the experim ent, it is

appropriate to consid er tw o technical objections w hich textual critics have put forw ard

concerning stem m a construction.

The first objection is one of M.L. West, printed above. West correctly pointed out

that any stem m a d erived by algorithm w ould be an unoriented stem m a (or, as the

clad ists say, an „unrooted clad ogram ‟).24 That is, the algorithm could d eterm ine the

branchings of the stem m a but could not ascertain w hich branching belongs “at the

top”(in practice, this am ounts to id entifying the nod es representing the subarchetypes).

An unrooted clad ogram (Figure 2) can represent several d istinct rooted versions (Figure

3). Each rooted clad ogram can, in turn represent several d istinct possible phylogenies

(Figure 4).25

Figure 2. Unrooted cladogram. This cladogram shows


the relationships of the specimens relative to one
another, but does not indicate their relationship to
ancestors from which they descend.

Figure 3. Rooted cladograms. Each of these five rooted cladograms is consistent with
the unrooted cladogram above (Figure 2). By postulating the first branching in the
descent, the known relationships specify the remainder of the tree. Note, however, that
the lenths of branches, and the specimens which might lie on the nodes of the tree, are
not indicated.

24
West, op . cit., p p . 71-2.
H u m p hries, C.J. and P.H . William s (1994) “Clad ogram s and Trees in Biod iversity,” M odels in Phylogeny
25

Reconstruction, Robert W. Scotland , Darrell J. Siebert, and David M. William s, ed s., Oxford , p p . 336-7.
Breind el 13

Figure 4. Phylogenetic Trees. All four of these phylogenetic trees are compatible with a
single cladogram above (Figure 3.ii). Note that schemata involving direct descent are
included.

West‟s objection is legitim ate. It should not, though, prevent us from pursuing

autom ated stem m a construction, for several reasons. First, the unrooted clad ogram is, if

accurate, a great advance over no stem m a and an even greater advance over an

incorrect stem m a. Second , it m ay in m any cases be tolerably easy to properly root the

clad ogram , thus prod ucing a trad itional stemm a, based on our know led ge of the d ates

and locales of origin for the various m anuscripts. Third , com puter m ethod s are

particularly useful in the frequent circum stance that the collation is not uniquely

com patible w ith any single proposed stem m a. In such cases, w e shall be happy to have

an analysis of the entire collation, a m ost-likely stem m a, and a m athem atical

justification for exclud ing m any other stem m ata.

The second objection is one ad vanced by Roger David Daw e in stud ies of the

trad itions of Aeschylus and Sophocles. 26 Daw e‟s contention is that there is so m uch

horizontal transm ission in the trad itions for these authors, as ind icated by num erous

true read ings appearing in d epend ent m anuscripts though absent in other m anuscripts,

as to invalid ate the stem m atic approach. 27 Daw e confronts the m ethod ology of Pasquali

26
Daw e, R.D. (1964) The Collation and Investigation of M anuscripts of A eschylus, Cam brid ge
and (1973) Studies on the Text of Sophocles, 2 vols., Leid en.
27
Cam eron, op . cit., p . 237.
Breind el 14

– and consequently confronts m y m ethod , w hich d erives partly through Pasquali, Maas,

and West – at least in the case of ind ivid ual authors such as Aeschylus. H e w rites:

We believe that the fact of unique preservation has been d em onstrated [in
the Aeschylean case]; consequently the fault m ust lie w ith the theory of
d escent, and w e conclud e that the ... stem m a d oes not after all represent,
even in the sim plest form , the true ch aracter of the trad ition. ...

It seem s clear that the picture presented by the m anuscripts is one of a


recension so entangled that it is utterly im possible for us to unravel the
thread s.28

Cam eron sum m arizes the problem s w hich Daw e‟s assertion poses to an y m ethod such

as the one em ployed in the present stud y:

Daw e d enies rad ically that archetypes can be reconstructed , but he


necessarily pays a theoretical price for his conclusion...

If there are no archetypes or stem m ata, and if true read ings are uniquely
preserved in any m anuscript regard less of its stem m atic position, w e are
then throw n back to a proced ure of evaluating read ings w hich is unaid ed
by consid erations of outgroup com parison, reconstruction of an
archetype, or to push the concept to its logical conclu sion, w ithout the
consid eration of manuscript authority of any kind . 29

In ord er that w e m ay avoid an im broglio in Aeschylean Textkritik, w e m ight conced e

Daw e‟s assertion to hold true in certain specific textual trad itions. But w e need not

suppose that any particular num ber of su ch trad itions invalid ates the d ed uctive

stem m atic m ethod in general. H ence, in the absence of any argum ent against stem m atic

representation of the Sallustian trad ition, w e can proceed to analyze it via the clad istic

approach.

Experimental Procedure

28
Daw e, op . cit. 1964, p p . 157-8.
29
Cam eron, op . cit., p p . 237-8.
Breind el 15

In this stud y, the m anuscripts containing the De Coniuratione Catilinae and the De

Bello Iugurthino w ere exam ined , as these tw o w orks are found together in one set of

m anuscripts. Absent access to a com plete collation, an ad apted collation w as form ed by

the follow ing m ethod . Eleven m anuscripts w ere selected from those includ ed in L.D.

Reynold s‟ Oxford text of 1991 (Table 1).

Siglum Manuscript
A Parisinus 16025
B Basileensis
C Parisinus 6085
D Parisinus 10195
F Hauniensis Fabricianus
H Berolinensis Phillippsianus 1902
K Vaticanus Palatinus 887
N Vaticanus Palatinus 889
P Parisinus 16024
Q Parisinus 5748
V Vaticanus 3864
(Florilegium Vaticanum)

Table 1

Beginning at Catilina 1.1, the first 300 loci w ere selected w hich contain variants in

one or m ore of the above eleven m anuscripts. 30 The ad apted collation w as then form ed

by listing, for each locus, the groups of manuscripts w hich exhibited the sam e read ing.

The collation then consisted of a sequence of row s su ch as appear in Table 2.

Locus: Group 1 Group 2 Group 3 Group 4


[rows 1-11]
12 ABCDFNP HK V
13 C N A BDFHKPV
[rows 14-300]

Table 2

30
To be m ore p recise, in keep ing w ith the biological m etap hor, only the latest m arkings in the
m anu scrip ts w ere collated . Thu s, as corrected m arkings w ere ignored , loci containing variants in ear lier
hand s are not inclu d ed in the 300. The selected loci d o, how ever, inclu d e every variant in the last hand (at
each locu s) of the ap p rop riate m anu scrip t from Catilina 1.1 to 52.35.
Breind el 16

To analyze the collation, the DN APARS com ponent of the PH YLIP package w as to be

em ployed , because it is the only com ponent of PH YLIP w hich can process m ulti-state

d iscrete characters (albeit by m arking the states w ith DN A labels). 31 DN APARS is a

program w hich com pares DN A base sequences for a set of specim ens and evaluates

various possible clad ogram s on the basis of a parsim ony criterion.

A parsim ony criterion favors arrangem ents of the specim ens w hich require the

few est character state changes in the course of the specim ens‟ evolution. For exam ple, a

phylogeny w hich requires a specim en possessing a DN A sequence of AAA to give rise

to one possessing ACT and , thereafter, requires the specim en possessing ACT to give

rise to one possessing the sequence AAA again w ould not be favored . This proposed

phylogeny requires tw o bases to change state (AA to CT) and later to change again

(back to AA), involving four base changes overall. Instead , a parsim ony criterion m ight

favor an arrangem ent w here one specim en featuring the AAA sequence gives rise to the

other w ith the AAA sequen ce, and the latter gives rise to that possessing the ACT

sequence.32 This latter phylogeny requires only a single change of tw o bases, or tw o

character state changes overall, and is thus m ore parsim onious than the form er.

Further assum ptions involved in the parsimony m ethod , and d iffering view s

about them , are listed (or references provid ed) by Felsenstein. 33

In ord er to evaluate the collation using DN APARS, the collation d ata had to be

converted from the form illustrated in Table 2 to a form w herein m anuscripts grouped

See “Frequ ently Asked Qu estions,” Felsenstein, op . cit.


31

This p hylogeny “m ight” be favored becau se one can observe other p ossible p hylogenies w ith only tw o
32

character state changes. Su ch p hylogenies w ou ld be equ ally p arsim oniou s w ith the one given, and hence
w ou ld be ju d ged equ ally d esirable by a p arsim ony criterion.
“DN APARS – DN A Parsim ony Program ” (d ocu m entation) in Felsenstein, op . cit.
33
Breind el 17

by a shared read ing w ere each assigned a particular DN A base abbreviation (A, C, G, T,

or “-“, w hich ind icates a fifth state to DN APARS). The DN A base label assigned to a

m anuscript at a particular locus w ould correspond to the group in w hich t hat

m anuscript resid ed at that locus.

Each row of the collation w ould yield one DN A base label for each m anuscript;

thus the 300 loci in the collation w ould prod uced a 300-base “DN A strand ” for each of

the eleven m anuscripts. The creation and d ata entry of these 3,300 base labels w as

beyond w hat could easily be accom plished m anually. To perform the task, a custom

application program was w ritten (MSS2DN A) w hich allow s the entry of the collation in

table form , perform s the translation to sequences of DN A base labels for the various

m anuscripts, and m ou nts the results on the Microsoft Wind ow s clipboard (Figure 5). 34

From the clipboard , the DN A d ata for the various m anu scripts w as assem bled

w ith a text ed itor into the file form at required by DN APARS, as d ocum ented by

Felsenstein.35 In ord er to facilitate com parison to Reynold s‟ stem m atic w ork on the

Sallust m anuscripts, and becau se they represent only parts of the text, d ata for

m anuscripts V (a florilegium ) and Q w ere rem oved from the d ata file, leaving the nine

m anuscripts for w hich Reynold s had published a stem m a. In rem oving V and Q, som e

27 (i.e., 9%) of the loci w ere rend ered irrelevant, although they rem ain in the set. 36

34
This p rogram , w hile not elegant, is p u blicly available (w ith sou rce cod e) so that others m ay
ind ep end ently cond u ct investigations or rep eat and verify the p resent investigation. The p r ogram ,
MSS2DN A, ru ns on 32-bit Microsoft Wind ow s p latform s (Wind ow s 95, Wind ow s 98, Wind ow s N T) and
m ay be d ow nload ed in archived (ZIP) form at http :/ / hom er.bu s.m iam i.ed u / ~ad breind / m ss2d na.zip
“Molecu lar Sequ ence Program s” in Felsenstein, op . cit.
35

36
These d ata p oints rep resent loci at w hich only Q and / or V d iffered from the consensu s of rem aining
m anu scrip ts. These sites can be id entified from Ap p end ix B, in the table m arked “step s in each site,” as
sites w here the table show s 0 step s. That is, the r em aining m anu scrip ts show consensu s at the site, so no
character state changes are requ ired for any p hylogenetic arrangem ent of the m anu scrip ts.
Breind el 18

The com pleted DN APARS file appears in this report as “Append ix A: Infile.”

The DN APARS program w as then run, u sing this file as its d ata source. 37

Figure 5. MSS2DNA. The columns collect the


manuscripts which share a reading at each locus. The
column headings indicate the DNA base labels which will
be attached to the manuscript groups.

DN APARS prod uced the output file w hich appears in this report as “Append ix B:

Outfile,” and w hich includ es the prelim inary phylogenetic tree (Figure 6). DN APARS

w as then run on the input d ata several m ore tim es in ord er that other possible m ost

parsim onious trees m ight be d iscovered . N o other m ost parsim onious trees w ere found .

37
The 386-Wind ow s p recom p iled PH YLIP execu tables w ere u sed throu ghou t. The p rogram op tions
selected for DN APARS w ere all d efau lts w ith the follow ing excep tions: Rand om ize ord er w as selected ,
w ith a seed of 69 (=4*17+1) and 100 p erm u tations of the inp u t row s; term inal typ e w as set to (none); inp u t
sequ ences interleaved w as set to N o; and all p rinting op tions for the ou tp u t w ere selected .
Breind el 19

One most parsimonious tree found:

+--F.Hauniens
+--8
+--7 +--D.Par10195
! !
+--6 +-----H.Beroline
! !
+--------5 +--------K.VatP_887
! !
! +-----------N.VatP_889
+--4
! ! +--C.Par_6085
! ! +--3
--1 +--------------2 +--B.Basileen
! !
! +-----A.Par16025
!
+-----------------------P.Par16024

remember: this is an unrooted tree!

Figure 6

In ord er that the output from this program m ight be com pared to Reynold s‟

published stem m a for Sallust, and in recognition of Reynold s‟ jud gm ents about the

quality of the textual variants, the tree w as re-oriented using the PH YLIP‟s RETREE

program . Since m anuscripts F, D, H , K, and N form ed a m onophyletic group and

because they had been collected in Reynold s‟ presentation of the Sallu st stem m a, the

nod e representing their com m on ancestor w as selected for the outgroup (or

subarchetype). N ote that although the tree w as re-oriented no changes w ere m ad e to

the genealogical relationships inferred betw een the m anuscripts by DN APARS. 38 The

transcript of the RETREE session appears in this report as “Append ix C: RETREE

38
Re-orientation in effect asserts likely p ositions for the su barchetyp es. As d escribed above, West had
ind icated that su ch a step w ou ld be requ ired , and that it shou ld be cond u cted u sing a critic‟s evalu ation
of the variants.
Breind el 20

Session.” 39 The session also prod uced as output a new tree file. This tree file w as u sed as

input to PH YLIP‟s DRAWGRAM program , w hich constructed a graphical

representation of the stem m a (Figure 7).

Figure 7

For the sake of com parison, Reynold s‟ stem m a is reprod uced (Figure 8). 40

Figure 8

As can be observed from the com puter-generated tree and Reynold s‟ tree

(Figures 7 and 8), they are nearly id entical m od ulo inversion. There are, how ever, tw o

The p rogram op tions selected for RETREE w ere all d efau lts w ith the follow ing excep tion: “no grap hics”
39

w as selected .
40
Reynold s, L.D., ed . (1991) C. Sallusti Crispi: Catilina, Iugurtha, Historiarum Fragmenta Selecta, A ppendix
Sallustiana, Oxford , p . xi.
Breind el 21

d ifferences. First, Reynold s associates N and K m ore closely w ith each other than w ith

H , D, or F, w hile DN APARS d etected no such d ifference in proxim ity. Second ,

Reynold s associates A m ore closely w ith P than w ith B or C, w hile DN APARS ind icated

no such closer affiliation. This latter d istinction can in fact be attributed to d ifferences in

the text being collated , rather than to d ifferences betw een the analyses of Reynold s and

DN APARS (see below ).

Analysis

Since several hund red rearrangem ents of the ord er of the “DN A strand s”

prod uced no further most parsim onious trees, it seem s reasonable to suppose that the

m anuscript collation d ata specify a unique m ost parsim onious tree. 41 The existence of a

unique m ost parsim onious tree is itself an ind ication that the present m ethod m ay be

prod uctive, as it obviates the need for a hum an to insert prejud ices into the analysis, by

selecting one clad ogram from a list of many. The sim ilarity of the results d erived

through Reynold s‟ analysis to those d erived through the parsim ony analysis can, in

light of the novelty of the approach, only be called stunning.

This sim ilarity is further strengthened w hen w e account for one of the tw o

ind icated d ifferences betw een the stem m ata. As d escribed above (see n. 30), in keeping

w ith the m etaphor of biological evolution, only the latest extant m arkings (corrections,

not includ ing d eletions) on each m anu script w ere collated . Thus, w here the first and

This su p p osition is based on Felsenstein‟s im p licit assu m p tion that a relatively sm all nu m ber of
41

rearrangem ents of the inp u t d ata ou ght to yield m u ltip le m ost p arsim oniou s trees if they exist. Su ch an
assertion seem s m athem atically su sp ect, consid ering th e large nu m ber of p ossible p erm u tations of, say,
nine m anu scrip ts (over 360,000). On this m atter, how ever, I d efer to Felsenstein‟s know led ge as a
sp ecialist.
Breind el 22

second hand s of A d iffered , the second hand w as read for the collation instead of the

first. Reynold s naturally constructs his stem m a ind icating the position of the original A

text. But he notes that “Secund a m anu s (A 2) librum lectionibu s instruxit ex aliquo stirpis

[= B, C] cod ice petitis.” That is, w here read ings exist in A 2, they com e from the B-C

branch – w hich fact DN APARS appears to have recognized , in asserting the A -A 2

m anuscript to d escend both from an ancestor of P and also from a closer ancestor of B

and C. To test this hypothesis, w e w ould m erely need to m od ify the collation to reflect

only A-A 1 read ings, and then see w here DN APARS places the m anuscript.

H aving taken the d iscrepancies into account, it seem s that both the hum an and

the m achine-assisted analysis d erive results from the sam e und erlying pattern am ong

the m anuscript read ings. This stud y, then, prelim inarily suggests that the parsim ony

analysis technique cou ld substantively ad vance know led ge of textual transm ission.

Furtherm ore, the parsim ony analysis can ind icate the read ings likely to appear in

the archetype and subarchetypes, in ord er that they m ost efficiently give rise to the

extant m anuscripts. A d etailed exam ination of such archetype reconstruction is beyond

the scope of this stud y. But am bitious read ers should note that Append ix B to this

paper (i.e., the DN APARS output) provid es the read ings likely to appear at various

nod es in the clad ogram for every locus stud ied . On Reynold s‟ view of the transm ission,

the archetype (his ), ought to bear the read ings given for nod e 4.

Future Research
Breind el 23

The future presents a num ber of im m ed iate challenges and possibilities for the

clad istic analysis of texts using p arsim ony techniques. The obvious m ethod s through

w hich the proced ure m ay be tested includ e exam ining a variety of texts, as w ell as

using full collations – in place of collations bu ilt from apparatus critici – so as to avoid

d epend ence on one ed itor‟s opin ion of w hat m ay be viable m anuscript read ings. 42

If positive results are ind icated , parsim ony analysis m ight be d eployed to assist

the textual critic in d eterm ining the relationships of texts, and in reconstructing

archetypes, for new publications. Perspectives m ay also be presented for re-evaluating

existing d ogm a about trad itions w hich have not been recently exam ined . 43 In the

classroom , the use of graphical interactive parsim ony program s, w hich allow one to

m anipulate stem m ata on -screen and im m ed iately to observe the consistencies or

inconsistencies thus fostered , m ay facilitate integration of stem m atics into the stand ard

classics curriculum .44 Lastly, literary theorists m ay w ish to pond er the existence of

d eeper m etaphors connecting the enzym es and m utation s of DN A replication w ith the

correspond ing verbal agents and scribal errors giving rise to m any of our textual

variants.

“Read ings w hich m u st qu ite certainly be elim inated have no p lace u nd er the text,” w rites Maas ( p . 23),
42

thu s giving ed itors license to om it even from the app. crit. those read ings d eem ed eliminanda.
43
We m ay su p p ose that p arsim ony analysis w ill be effective in evalu ating relationship s betw een
m anu scrip ts of texts in m od ern, as w ell as ancient, langu a ges.
44
MacClad e (d istribu ted by Sinau er Associates) is one su ch p rogram . Many cand id ates w hich m ight be
u sefu l for heavy-d u ty analysis as w ell as p ed agogy are d escribed by Felsenstein at
http :/ / evolu tion.genetics.w ashington.ed u / p hylip / softw are.htm l
Appendix A: Infile

9 300

P.Par16024AAACCCCCCCAATACCCCCCAACGCCCACACCCAACCACCACCCCCACGACGGAAACCCCCCGCCCCCCACCACAC
CACACCCCCCACCCAACCCATACCACCAAACACACGCCCCCGCCACACGACGGCACACACCCCACACAACAC-
CTCACCCACAACCCCCCCCACCACCCAACAAACCGCCCAAACCCACACCCCCACCACCCAACCCCCACCCCACACCCCCACAAACC
CCCCACCTACCCCCCCCCACCAGCCCCACCAACCAAACCAACCCCCGAACCACCCCCACCCCCCA
A.Par16025CCACACCCCACAGCCCCCCCACCGCACCCCCCACCCCCCCCCCCCCCCATCGAACCCGCCACGCCCCCCGCAACCC
CCCCCCCCCCCACACTCCCCACCACCCACACCCCCGACCCAAACCAAAAACGGCCCCCCCCCCACACCCCCCCCAC-
CCACCCAAAACACCCACCCCCCCACACCCAACCAACAAAACCACCCCCCCCAACACCCCCCACCACCCCCCCACACCCCCCAAACA
ACCCACCCCCACCCCCCCGCCCCACCCACCACCCCACACCCCGACCCCACCCCGACACCGC
B.BasileenAACCCCCCCAAATCACCCCCAGCGCCCACACAACCCCCACACCCCCGCCCCGGAACCCCCCCCCACCACCCCCCCC
CCCCCCCCCCCACCATACCCAACGAACAAACCCCCGAACCACACACAACCCGGACACCGCCCCACACCCCCCACCACCCACCCCCA
ACCCCCACACCCCACCAGACAACCACCCCCCCCACCCCCCCCAACCCCCCCCCCCCAACCCCCCCCCCCCCCCCGCCCACCCCCCC
CACCACCCCGCACCACCCACCGCCCCACCCCCCGACCCCCCCCCCACACCGA
C.Par_6085AAACCCCCCAAAACACCCCCAGCGCACCCCCAACCCCCCCCCCCCCGCACCGGACCCCCCACACACCACTCAACCC
CACACCCCCCCACACTCCCCTACGCCCACACCCCCGCACCACACAAAACCCGGCCACCCCCCCACACCCCCCCCCACCCACCCCCA
ACCCCCCCCCCCCCCCAGCCACCCACCCCCAACACCCCCCCCACCACCCCCCACCCCAACACACCCCCCCCCCCGCCCCCCCCCCC
CACCCCCCCGCCCCACCCACCACCCCACCCCCCGACCCCACCCCGACACCGC
N.VatP_889CAACCCAACCCACACCACAAACCGCCCCACCCCCCAAAACACCCCCAAGGCAGCCCCACACAAACCCCCACCACCC
CCCCCAACCCCACCCGCCCCACAAAGAACCCACCCGCCCCCGCCCCCAGCAGGCGGCCGACACCACCCCCCAGCGCCCCCCCCCCC
AACCCCCCCCCCCCCCAGACAGCACACACCCAACACCCCCCCAACACCCCCACACCCCACCCCCCCCCCCAACCACAACAACCACC
CCCCCCCCAGCCCCCCCAACCACCCCCCCCCCCCCCCACCCCCCGCAGCCGC
K.VatP_887ACCCCCCACCCCTCCAACCAAGCGCCCCCCCCCCCAAAACACCACAAAGGAGGCCCCCCCACGACCCCCGCCGCCC
CCCCCCCCCACCCCGGACCCCACAAGACCCACCCCACACACGCCACCCGCAGAGGGCCCACACCACCCACCATCCCTCCACCCCCC
ACCACACACCCCCAACAGCCAGCCCCCCCCCAACCCCACCACAACCCACAAACCCCCCCCCCCCCCCCCCCACCACCCAACACACC
CCCACCCAAGCCCCACCACCCCCACCCCCCCCCGCCCACCACCCACCACCGC
H.BerolineAACCCACCACACTCCCCCCACGACCCCCACCCCCCCACACGCCCCAAAGCCGGACCCCACCGGCCCCACCACCCCC
CCCCCCCCCACACCCTACCCGCCTAGCCCAACCCACCCACCGCCCCCAGCCGGACTCCCCAACCCCCCGCCATCCC-
CCACCCCCAACCACACCCCACCACAAGACAGACCCCCCCCCCACCCCCCAAGCCCAAAAACAGACCAACCCCCACCCCCCCCCCCC
CAC-GCCCCCCCACCCCCACACCCCCACCAGCCCCACCCACCGCCCCCCCCCAACCTCCGC
D.Par10195CACACACCACCATCCACACAATCACCACCCACCCCCAAACGACAAAACGGACCCCCCCCCCAGACAAACACCCACC
CCCCACAACACCCCTCAAACGCCAAGCCCCACACAACCAACGCCCCCCGCACCGGGCCACAAACCCCCCCCAGCCCGCCCCACCCA
CCCCAACAACAACGAAAGCAAGCCCCCCCCCACCCAACCCAACCCAAAAAACCGACCAACCACCACCCCCCCCCCCCCACGCCCAC
CCCACCACCCCACACCCACCAGCAAACACCCAAGCCCCCCCACCCCCACCAC
F.HauniensCACACACCCACATCCAAACCAGCAACACCCACCCCCAACAAAACAAACGGACCCCCATCCCACACAAACACACACA
CCCCACAACCCACCTGAACCGGCAAGCCCCCCACCACAAACGCACCCCGCCAGTGTCCGCACACCCCACCCAGCCCGCCCCACCCA
CACAAACACCAAAGAAAGCAAGCCCCACCCCACGCAACACAACCCCAAAACCCCAACACCCACCACCCCCCCCCCACCACGCCCAC
CACAACACCCCACCCCCACCAGCCAAGACCAAAGCCACCCCACCCCC-ACAC
Breind el 25

Appendix B: Outfile

DNA parsimony algorithm, version 3.572c

Name Sequences
---- ---------

P.Par16024 AAACCCCCCC AATACCCCCC AACGCCCACA CCCAACCACC ACCCCCACGA CGGAAACCCC


A.Par16025 CC..A....A C.GC...... .C...A.C.C ..ACC..C.. C.....C.AT ..A.CC.G..
B.Basileen ..C......A ...CA..... .G........ .AACC..CA. ......G.CC .....C....
C.Par_6085 .........A ..ACA..... .G...A.C.C .AACC..C.. C.....G.AC ....CC....
N.VatP_889 C.....AA.. C.C...A.AA .C.....CAC ...CCAA.A. .......A.G .A.CCC.A.A
K.VatP_887 .CC....A.. CC.C.AA..A .G.....C.C ...CCAA.A. ...A.A.A.G A..CCC....
H.Beroline ..C..A..A. .C.C.....A CGAC...CAC ...CC.ACA. G....A.A.C ....CC..A.
D.Par10195 C.CA.A..A. C..C.A.A.A .T.A..AC.C A..CC.A.A. GA.AAA...G ACCCCC....
F.Hauniens C.CA.A...A C..C.AAA.. .G.AA.AC.C A..CC.A..A .AA.AA...G ACCCCCAT..

P.Par16024 CCGCCCCCCA CCACACCACA CCCCCCACCC AACCCATACC ACCAAACACA CGCCCCCGCC


A.Par16025 A........G .A..C..C.C ......CA.A CT...CAC.A C...C..C.C ..A...AAA.
B.Basileen ..C.A..A.C ..C.C..C.C ......CA.. .TA..CA..G .A.....C.C ..AA..ACA.
C.Par_6085 A.A.A..A.T .A..C..... ......CA.A CT...C...G C...C..C.C ...A..ACA.
N.VatP_889 .AAA...... ....C..C.C .AA...CA.. CG...CACAA .GA.CC...C ..........
K.VatP_887 A..A.....G ..G.C..C.C .....AC... GGA..CC..A .GACCCAC.C .A.A.A....
H.Beroline .G.....A.C A.C.C..C.C .....ACA.. CTA..CGC.T .G.CC.AC.C AC..A.....
D.Par10195 .A.A.AAA.. ..CAC..C.C A.AA.AC... TCAAACGC.A .G.CCCACAC AA..AA....
F.Hauniens .ACA.AAA.. .ACACA.C.C A.AA..CA.. TGAA.CGG.A .G.CCC.CAC .A.AAA...A

P.Par16024 ACACGACGGC ACACACCCCA CACAACAC-C TCACCCACAA CCCCCCCCAC CACCCAACAA


A.Par16025 CA.AA..... C.C.C..... ...CC.C.C. A.-..AC.C. AAA.A..... .C...C...C
B.Basileen ...ACC...A CAC.G..... ...CC.C.A. CAC..AC.CC .AA....... AC....C..G
C.Par_6085 .A.ACC.... CAC.C..... ...CC.C.C. CAC..AC.CC .AA.....C. .C...CC..G
N.VatP_889 C.CA.CA... GGC.GA.A.C AC.CC.CAG. G.C...C.CC ..AA....C. .C...CC..G
K.VatP_887 ..C..CA.AG GGC.CA.A.C AC.C..CAT. C.T..AC.CC ..A..A.ACA .C.......G
H.Beroline C.CA.C...A CTC.C.AA.C .C.CG.CAT. C.-..AC.CC .AA..A.AC. .CA...CA.G
D.Par10195 C.C..CACCG GGC...AAAC .C.CC.CAG. C.G...CACC .A....AACA ACAA.G.A.G
F.Hauniens C.C..C.A.T GTC.G.A.AC .C..C.CAG. C.G...CACC .A.A.AAACA .CAAAG.A.G

P.Par16024 ACCGCCCAAA CCCACACCCC CACCACCCAA CCCCCACCCC ACACCCCCAC AAACCCCCCA


A.Par16025 C.AA..A.C. AAAC...... .C...A.ACC ....AC.A.. C.C..A.AC. CCC.AAA.A.
B.Basileen ..AA..ACCC ...C...... .C...A..CC .....C..AA C.C.....C. CCC...G..C
C.Par_6085 C.AC..ACCC ..A....... .C.....ACC ....AC...A ....A...C. CCC...G..C
N.VatP_889 ..A..A..C. ....ACA... .C...A.ACC ...A...... ..C.....C. CC.A..A.A.
K.VatP_887 C.A....CCC ....AC...A .CA..A..C. .AAA.C.... C.C.....C. CCCA..A..C
H.Beroline ..A.A..CCC ...C...... .CAAG..... AAA.AGA..A ..C...A.C. CCC......C
D.Par10195 CAA....CCC .....C.AA. .CAAC..A.. AAA..GA..A ..CA..A.C. CCC......C
F.Hauniens CAA....C.C .....G.AA. ACAAC..... AA...CAA.A C.CA..A.C. CCC....A.C

P.Par16024 CCTACCCCCC CCCACCAGCC CCACCAACCA AACCAACCCC CGAACCACCC CCACCCCCCA


A.Par16025 ..C......A ...C..C... .....C.... CC...CA... ...C..CA.. ..GA.A..GC
B.Basileen A.CC.....A ..AC..C..A .....C...G CC...C.... ...C..C... ..CA.A..G.
C.Par_6085 ..CC.....A ...C..C... .....C.... CC...C.... ...C..CA.. ..GA.A..GC
N.VatP_889 .AAC.A.... ...C...... ..C....... CC..CC.... .CCC.AC... ..G.AG..GC
K.VatP_887 AAC..A.... .A.C.A.... ......C..C C...CC.... ..CC.AC.A. .....A..GC
H.Beroline A.-G...... .A.C..CA.A ..C...C.AG CC...C..A. ..CC..C... .A...T..GC
D.Par10195 A.GC..A... .A.CA.CC.A .AC...C.AG C.AAC....A A.CC..C..A ..C..A..AC
F.Hauniens A.GC..A..A .AACA.CC.A ..C...C.AG CCAAG...AA A.CCA.C..A ..C..-A.AC
Breind el 26

One most parsimonious tree found:

+--F.Hauniens
+--8
+--7 +--D.Par10195
! !
+--6 +-----H.Beroline
! !
+--------5 +--------K.VatP_887
! !
! +-----------N.VatP_889
+--4
! ! +--C.Par_6085
! ! +--3
--1 +--------------2 +--B.Basileen
! !
! +-----A.Par16025
!
+-----------------------P.Par16024

remember: this is an unrooted tree!

requires a total of 500.000

steps in each site:


0 1 2 3 4 5 6 7 8 9
*-----------------------------------------
0! 3 2 2 1 1 1 1 2 2
10! 2 3 2 3 2 1 2 3 1 1
20! 2 1 4 1 2 1 2 1 2 2
30! 2 1 1 1 1 1 2 1 2 3
40! 1 4 1 1 2 1 1 2 2 2
50! 4 2 2 2 2 2 1 1 3 1
60! 1 3 3 4 2 1 1 1 2 0
70! 5 1 3 3 1 1 1 0 2 0
80! 2 1 1 2 1 0 2 1 3 0
90! 2 4 4 2 1 1 1 4 4 1
100! 3 2 2 2 1 2 2 2 2 1
110! 1 2 2 2 3 1 2 1 2 1
120! 1 3 2 1 3 2 2 3 2 2
130! 4 3 4 1 0 5 2 1 2 1
140! 1 2 1 0 2 3 0 1 1 5
150! 0 3 1 5 0 0 3 1 1 1
160! 2 1 2 2 2 1 2 1 1 2
170! 2 2 1 1 1 1 4 3 1 0
180! 2 4 1 1 2 1 1 1 2 2
190! 2 1 1 2 3 2 3 1 1 1
200! 1 1 1 1 1 2 3 0 4 2
210! 2 1 1 2 2 3 4 1 2 1
220! 2 4 0 2 1 1 1 1 1 1
230! 0 1 1 2 2 1 1 3 1 2
240! 2 2 2 4 4 0 2 1 0 0
250! 2 0 1 2 1 1 1 2 2 0
260! 2 0 1 2 0 0 1 1 0 1
270! 3 1 3 1 1 3 2 1 0 2
280! 1 1 1 1 1 1 2 1 2 1
290! 1 0 1 4 1 1 4 1 0 2
300! 2
Breind el 27

From To Any Steps? State at upper node


( . means same as in the node below it on tree)

1 AAACCCCCCC MATMCCCCCC AVCGCCCMCM CCCMMCCACC


1 4 maybe .......... .......... .S.....C.C ...CC.....
4 5 yes .......M.. C.....M..A .......... .....MA.A.
5 6 yes ..C....... .M.C.M.... .G........ ..........
6 7 yes .....A.CM. .......... ...V...... .....C....
7 8 yes C..A...... .A...A.A.. ...A..A... A.........
8 F.Hauniens yes ........CA ......A..C ....A..... ........CA
8 D.Par10195 yes ........A. ......C... .T........ ..........
7 H.Beroline yes ........A. AC...CC... C.AC....A. .......C..
6 K.VatP_887 yes .C.....A.. .C...AA... .......... .....A....
5 N.VatP_889 yes C.....AA.. ..CA..A.A. .C......A. .....A....
4 2 yes .........A ...C...... .....M.... ..A....C..
2 3 yes .......... A...A..... .G........ .A........
3 C.Par_6085 yes .......... ..A....... .....A.... ..........
3 B.Basileen yes ..C....... .......... .....C.A.A ........A.
2 A.Par16025 yes CC..A..... C.G....... .C...A.... ..........
1 P.Par16024 maybe .......... A..A...... .A.....A.A ...AA.....

1 ACCCCCACGN CGGAMMCCCC CCGCCCCCCA CCACMCCMCM


1 4 maybe .......... ....CC.... .......... ....C..C.C
4 5 yes .......A.G ...C...... .M.A...... ..........
5 6 yes .....A.... M......... .......... ..V.......
6 7 yes R......... .......... .V.....A.. ..C.......
7 8 yes .A..A..C.. ACC....... .A...AA... ...A......
8 F.Hauniens yes A.A....... ......AT.. ..C....... .A...A....
8 D.Par10195 yes G..A...... .......... .......... ..........
7 H.Beroline yes G........C C..A....A. .G.C.....C A.........
6 K.VatP_887 yes ...A...... A......... AC.......G ..G.......
5 N.VatP_889 yes .......... .A.....A.A .AA....... ..........
4 2 yes M.....V.A. .......... M........N .M........
2 3 yes ......G..C .......... ..V.A..A.. ..........
3 C.Par_6085 yes C......... .......... A.A......T .A.....A.A
3 B.Basileen yes A.......C. ....A..... C.C......C .CC.......
2 A.Par16025 yes C.....C..T ..A....G.. A........G .A........
1 P.Par16024 maybe .........A ....AA.... .......... ....A..A.A

1 CCCCCCMMCC MDCCCMWMCM ACCAMACACM CGCCCCCGCC


1 4 maybe ......CA.. C....CA..A ....C..M.C ..........
4 5 yes .......... .G........ .GM..C.... ..........
5 6 yes .....A.... ..A...V... ...C..AC.. .A...M....
6 7 yes .......... ......GC.. ..C....... M...A.....
7 8 yes A.AA...... T..A...... ........A. .....A....
8 F.Hauniens yes .....C.... .......G.. ......C... C..A.....A
8 D.Par10195 yes .......C.. .C..A..... .......... A.........
7 H.Beroline yes .......... .T.......T .....A.... AC...C....
6 K.VatP_887 yes .......C.. G.....CA.. ..A....... ...A.A....
5 N.VatP_889 yes .AA....... .......CA. ..A....A.. ..........
4 2 yes .........M .T........ M......C.. ..M...AVA.
2 3 yes .......... .......A.G .......... ...A...C..
3 C.Par_6085 yes .........A ......T... C......... ..C.......
3 B.Basileen yes .........C A.A....... AA..A..... ..A.......
2 A.Par16025 maybe .........A .......C.. C......... ..A....A..
1 P.Par16024 maybe ......AC.. AA...ATA.C ....A....A ..........

1 MCAMGMCGGC VCMCMCCCCA CACMMCMC?C YC?CCMMCMM


1 4 maybe .......... ..C.C..... ...CC.C... C.?...C.C.
4 5 yes ..C..CM... GG...M.A.C MC.....A.. .........C
5 6 yes .........G .......... ........K. ..?.......
6 7 yes C......... .K...CA... C......... ..........
Breind el 28

7 8 yes ...C...V.. ....V...A. ........G. ..G..C.A..


8 F.Hauniens yes ......CA.T .T..G..C.. ...A...... ..........
8 D.Par10195 yes ......ACC. .G..A..... .......... ..........
7 H.Beroline yes ...A..C..A CT........ ....G...T. ..-..A....
6 K.VatP_887 yes A..C..A.A. .....A.... A...A...T. ..T..A....
5 N.VatP_889 yes C..A..A... ....GA.... A.......G. G.C..C....
4 2 maybe .M.AV..... C......... ........C. .....A....
2 3 yes A...CC.... .A........ .......... .AC......C
3 C.Par_6085 maybe .A........ .......... .......... ..........
3 B.Basileen yes .C.......A ....G..... ........A. ..........
2 A.Par16025 yes CA..AA.... .......... .......... A.-......A
1 P.Par16024 maybe A..C.A.... A.A.A..... ...AA.A.-. T.A..CA.AA

1 CCMCCCCCAC CMCCCMACAR MCMGCCCAMA CCCACACCCC


1 4 maybe ..A....... .C.......G ..A.....C. ..........
4 5 yes ........C. .......... .......... ....MC....
5 6 yes .....A.A.M .....A.... .......C.C ..........
6 7 yes .A........ ..A....A.. .......... ....C.....
7 8 yes ..C...A..A ...A.G.... CA........ .......AA.
8 F.Hauniens yes ...A...... ....A..... ........A. .....G....
8 D.Par10195 yes .....C.... A......... .......... ..........
7 H.Beroline yes .........C ......C... A...A..... ...C.A....
6 K.VatP_887 yes .........A .......... C......... ....A....A
5 N.VatP_889 yes ...A...... .....CC... A....A.... ....A.A...
4 2 yes .A........ .......... ...A..A... ..MM......
2 3 yes .......... ......C... .......C.C ..........
3 C.Par_6085 yes ........C. .....C.... C..C...... ..AA......
3 B.Basileen yes .......... A....A.... A......... ..CC......
2 A.Par16025 yes A...A..... .....C...C C......... AAAC......
1 P.Par16024 maybe ..C....... .A...A...A A.C.....A. ..........

1 CMCCAMCMMM CCCCCMCCCC ACMCCCCCMC MMACCCMCCA


1 4 maybe .C...A..C. .......... ..C.....C. CCM...A...
4 5 maybe .......... ...M...... .......... ...M......
5 6 yes ..A....C.A .AA..V.... .......... ..C......C
6 7 yes ...AVC..A. A..C.SA..A ......A... ...C..C...
7 8 yes ....C..... .......... ...A...... ..........
8 F.Hauniens yes A......... ..C..C.A.. C......... .......A..
8 D.Par10195 yes .......A.. .....G.... .......... ..........
7 H.Beroline yes ....G..... ....AG.... .......... ..........
6 K.VatP_887 yes .......... ...A.C.... C......... ...A......
5 N.VatP_889 yes .......A.C ...A.A.... .......... ..AA....A.
4 2 maybe .........C ....MC.... M......... ..C.......
2 3 yes .......... .........A .......... ......G..C
3 C.Par_6085 yes .....C.A.. ....A..... A.A.A..... ..........
3 B.Basileen yes .......C.. ....C...A. C......... ..........
2 A.Par16025 yes .......A.. ....A..A.. C....A.A.. ....AA..A.
1 P.Par16024 maybe .A...C.CAA .....A.... ..A.....A. AA....C...

1 CCYMCCCCCC CCCMCCAGCC CCACCAACCA MMCCAMCCCC


1 4 maybe ..C....... ...C...... .......... CC...C....
4 5 yes .M...M.... .......... ..M....... ....C.....
5 6 yes A......... .A........ ......C..V ..........
6 7 yes .C?V.C.... ......CV.A ..C.....AG ........M.
7 8 yes ..GC..A... ....A..C.. .......... ..AA.A...A
8 F.Hauniens yes .........A ..A....... .......... ....G...A.
8 D.Par10195 yes .......... .......... .A........ .A......C.
7 H.Beroline yes ..-G...... .......A.. .......... ....A...A.
6 K.VatP_887 yes .A.A.A.... .....A.... ..A......C .A........
5 N.VatP_889 yes .AAC.A.... .......... ..C....... ..........
4 2 yes .........A ......C... .....C.... ..........
2 3 maybe ...C...... .......... .......... ..........
3 C.Par_6085 no .......... .......... .......... ..........
Breind el 29

3 B.Basileen yes A......... ..A......A .........G ..........


2 A.Par16025 yes ...A...... .......... .......... ......A...
1 P.Par16024 maybe ..TA...... ...A...... .......... AA...A....

1 CGAMCCMCCC CCRCCMCCSM
1 4 maybe ...C..C... .....A..GC
4 5 yes ..C..M.... ..........
5 6 maybe .......... ..A.......
6 7 maybe .....C.... ..........
7 8 yes A........A ..C.....A.
8 F.Hauniens yes ....A..... .....-A...
8 D.Par10195 no .......... ..........
7 H.Beroline yes .......... .A...T....
6 K.VatP_887 yes .....A..A. ..........
5 N.VatP_889 yes .C...A.... ..G.AG....
4 2 yes .......M.. ..GA......
2 3 no .......... ..........
3 C.Par_6085 maybe .......A.. ..........
3 B.Basileen yes .......C.. ..C......A
2 A.Par16025 maybe .......A.. ..........
1 P.Par16024 maybe ...A..A... ..A..C..CA
Breind el 30

Appendix C: RETREE Session

Tree Rearrangement, version 3.572c

Settings for this run:


U Initial tree (arbitrary, user, specify)? User tree from tree file
N Use the Nexus format to write out trees? No
0 Graphics type (IBM PC, VT52, ANSI)? ANSI
W Width of terminal screen, of plotting area? 80, 80
L Number of lines on screen? 24

Are these settings correct? (type Y or the letter for one to change)
0

Tree Rearrangement, version 3.572c

Settings for this run:


U Initial tree (arbitrary, user, specify)? User tree from tree file
N Use the Nexus format to write out trees? No
0 Graphics type (IBM PC, VT52, ANSI)? (none)
W Width of terminal screen, of plotting area? 80, 80
L Number of lines on screen? 24

Are these settings correct? (type Y or the letter for one to change)
y

Reading tree file ...

retree: can't read intree


Please enter a new filename>treefile

,>>1:F.Hauniens
,>15
,>14 `>>2:D.Par10195
! !
,>13 `>>>>>3:H.Beroline
! !
,>>>>>>>12 `>>>>>>>>4:K.VatP 887
! !
! `>>>>>>>>>>>5:N.VatP 889
,>11
! ! ,>>6:C.Par 6085
! ! ,>17
-10 `>>>>>>>>>>>>>16 `>>7:B.Basileen
! !
! `>>>>>8:A.Par16025
!
`>>>>>>>>>>>>>>>>>>>>>>>9:P.Par16024

NEXT? (Options: R . U W O T F B N H J K L C + ? X Q) (? for Help) o


Which node should be the new outgroup? 12

,>>1:F.Hauniens
,>15
,>14 `>>2:D.Par10195
! !
,>13 `>>>>>3:H.Beroline
! !
Breind el 31

,>>>>>>>>>>12 `>>>>>>>>4:K.VatP 887


! !
! `>>>>>>>>>>>5:N.VatP 889
!
-10 ,>>6:C.Par 6085
! ,>17
! ,>16 `>>7:B.Basileen
! ! !
`>>>>>>>>>>>>>11 `>>>>>8:A.Par16025
!
`>>>>>>>>9:P.Par16024

NEXT? (Options: R . U W O T F B N H J K L C + ? X Q) (? for Help) w


Enter R if the tree is to be rooted
OR enter U if the tree is to be unrooted: r

Tree written to file