You are on page 1of 10

BBAPAP-38470; No.

of pages: 10; 4C: 2, 3, 5, 7


Biochimica et Biophysica Acta xxx (2010) xxxxxx

Contents lists available at ScienceDirect

Biochimica et Biophysica Acta


j o u r n a l h o m e p a g e : w w w. e l s e v i e r. c o m / l o c a t e / b b a p a p

Review

Arthropod CYPomes illustrate the tempo and mode in P450 evolution


R. Feyereisen
UMR1301 INRA CNRS Universite de Nice Sophia Antipolis, 400 route des Chappes, 06903 Sophia Antipolis, France

a r t i c l e

i n f o

a b s t r a c t
The great diversity of P450 genes in a variety of organisms is well documented but not well explained. The number of CYP genes in each species is highly variable and this is shown here for arthropod, mainly insect CYPomes. Pairs of recognizable orthologs are but a small portion of the CYPome, but species- or lineagespecic expansions of CYP subfamilies are consistently observed. These blooms of CYP genes have their origin in multiple gene duplications, although some subfamilies expand and others do not. Stochastic birth and death models of CYP gene proliferation are sufcient to explain blooms, and speciation events may play important roles in CYPome diversity between lineages. Mitochondrial clan P450s are a monophyletic group of genes that has seen several blooms in insects, but apparently not in vertebrates. 2010 Elsevier B.V. All rights reserved.

Article history: Received 11 May 2010 Received in revised form 3 June 2010 Accepted 16 June 2010 Available online xxxx Keywords: P450 evolution Gene duplication Gene family Arthropods

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P450 diversity in arthropods . . . . . . . . . . . . . . . . . . . . . . 2.1. Approximations of P450 numbers . . . . . . . . . . . . . . . . 2.2. Diversity at the CYP clan, family and subfamily level: CYP blooms . 3. CYPome diversity: Past and present . . . . . . . . . . . . . . . . . . . 3.1. P450 birth and death . . . . . . . . . . . . . . . . . . . . . . 3.2. P450 phylogeny . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Past diversity vs. present diversity . . . . . . . . . . . . . . . . 3.4. Asymmetry in P450 diversity. . . . . . . . . . . . . . . . . . . 4. Fate of duplicated genes . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Neofunctionalization and subfunctionalization. . . . . . . . . . . 4.2. Blooms and clusters . . . . . . . . . . . . . . . . . . . . . . . 4.3. Are blooms restricted to some CYP subfamilies? . . . . . . . . . . 5. Mitochondrial P450s . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1. Diversity of the mitochondrial CYP clan: Arthropods vs. vertebrates 5.2. The origins of the mitochondrial P450 clan . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1. Introduction The P450 gene superfamily is large enough that after transcriptome and proteome became the logical followers of genome in the biological discourse, the term CYPome was introduced in the literature [1]. As this issue is dedicated to Klaus Ruckpaul and his contributions to the P450 eld, it seems appropriate to celebrate by joining in the use of this evocative term. The explosion of new knowledge on CYPomes brought

E-mail address: rfeyer@sophia.inra.fr. 1570-9639/$ see front matter 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.bbapap.2010.06.012

about by the sequencing of various genomes is impacting the eld of P450 research in many ways. The sheer number of P450 genes, the size of the CYPomes, is bafing (Nelson, this issue). Humans have a CYPome of 57 P450 genes and 58 pseudogenes, distributed in 18 CYP families, but the mouse has a CYPome of 102 P450 genes and nearly 90 pseudogenes. The human CYP2D6 debrisoquine hydroxylase gene has nine CYP2D paralogs in the mouse [2]. The number of P450 genes is therefore highly variable over evolutionary time, even over the very short 90 million years (MY) that separate human and mouse. This highly dynamic nature of the P450 superfamily shown by the difference between humans and mouse is neither just curious nor just

Please cite this article as: R. Feyereisen, Arthropod CYPomes illustrate the tempo and mode in P450 evolution, Biochim. Biophys. Acta (2010), doi:10.1016/j.bbapap.2010.06.012

R. Feyereisen / Biochimica et Biophysica Acta xxx (2010) xxxxxx

obvious. Beyond the implications of P450 diversity for our reliance on model species in research, particularly risk assessment in toxicology, the origins of P450 diversity are of great interest intrinsically. Because it is so large, the P450 superfamily can serve as a model for gene family evolution, perhaps one where knowledge of specic functions of P450 can bring deeper insights into the mechanism of gene family evolution. Therefore, while it is certainly presumptuous to paraphrase the title of G. G. Simpson's famous book on evolution for this paper, both the tempo (rates) and the mode (patterns and mechanisms) of evolution of the CYPomes are fascinating and still poorly understood. Here, I will examine the dynamics of P450 evolution in the light of current concepts of gene family evolution. I will rst describe the diversity of CYPomes using arthropods (mainly insects) as examples, then attempt to describe how the processes of gene duplications in the P450 family have shaped its diversity. I close with a discussion of mitochondrial P450 that are surprisingly diverse in their own right, with insects and vertebrates having found different ways to metabolize xenobiotics within this organelle. 2. P450 diversity in arthropods By all estimates, the number of arthropod species and of insects in particular is larger than that of other animals, fungi and plants combined. Coleoptera (beetles) and Lepidoptera (moths and butteries) together make up more than half of all insect species [3]. The diversity of P450 genes in arthropods has been documented gradually over the last 10 years, with an increasing number of genomes available for study. This rich dataset allows us to study the patterns of evolution of the P450 superfamily over a 450 million year (MY) timespan, which is about equivalent to the timespan of vertebrate diversication from bony shes to mammals. What can be learned from arthropod P450 evolution is not only of value to our understanding of P450 evolution in general, but of course of great value as well to our understanding of arthropod physiology, and arthropod adaptation to very diverse environments and lifestyles. The arthropod vectors of disease and pests of agriculture are major impediments to sustained development because of their combined

effect on health and food security. P450 diversity and polymorphisms in humans can determine adverse effects of drugs, and this causes many deaths and increased medical costs [4]. However, for the vast majority of humans with little access to drugs, this is less critical than P450 diversity and polymorphisms in arthropods that can determine the success of their crop or the survival of their livestock. Indeed, P450s play a major role in the adaptation of insect herbivores to their host plant (see Schuler, in this issue), so that an insect can become a devastating pest if it can defeat the crop plant's defenses by detoxication through an appropriate P450 from its diverse arsenal. P450 polymorphisms in arthropods also provide the variation upon which selection for insecticide or acaricide resistance can act, with negative effects on malaria control programs and crop protection programs alike. Rapid progress in genome sequencing makes a recent review of insect P450 diversity [5] outdated with reference to the number of species but the main conclusions remain valid. We now have access to Drosophila melanogaster and 11 related Drosophila species that have diverged over the last 60 MY. Three species of mosquitoes (also Diptera), and representatives of other major Orders of insects, Lepidoptera, Coleoptera, Hymenoptera, Hemiptera and Phthiraptera (the human body louse) are also available. In addition, genomes from one crustacean species (Daphnia pulex) and representative Acari (the cattle tick Ixodes scapularis and the two-spotted spider mite, Tetranychus urticae) have been sequenced. These genomes are not all complete or published, and several others are nearing completion. 2.1. Approximations of P450 numbers Fig. 1 shows a simplied phylogenetic relationship between arthropod species and the approximate number of P450 genes carried by their genome. The numbers are approximations for two reasons, one trivial, one less so. The trivial reason is that coverage and quality of assembly of newly sequenced genomes is quite variable, so that numbers of P450 genes can change with new releases of the genomes. In some cases this means that new sequence brings new genes, but more often it means that a gene fragment or putative pseudogene is promoted to

Fig. 1. Approximate number of CYP genes in fully sequenced genomes of several arthropod species shown with their phylogenetic relationships.

Please cite this article as: R. Feyereisen, Arthropod CYPomes illustrate the tempo and mode in P450 evolution, Biochim. Biophys. Acta (2010), doi:10.1016/j.bbapap.2010.06.012

R. Feyereisen / Biochimica et Biophysica Acta xxx (2010) xxxxxx

bona de gene. An example is Cyp307a2 in D. melanogaster, rst thought to be a pseudogene [6], then obtained as a full gene when more DNA sequence from heterochromatin, not available in the initial release of the Drosophila genome was annotated. Similarly, a sequence gap adjacent to Cyp12d1, when lled, revealed a second Cyp12d gene with just three (non-silent) nucleotide substitutions. Presumably, the very high identity of these two adjacent genes caused a problem in the initial assembly resulting in a gap. Such problems are found for all genes and all genomes, and only for very few organisms can the CYPome annotation be considered nished. New generation sequencing technologies may increase this problem because few genomes of higher eukaryotes will deserve the close attention and depth of coverage that major model species are receiving. The less trivial reason is that there is polymorphism in gene copy number, so that the number of genes in the CYPome of an organism depends on one's view point. The reference genomes may have more or less CYP genes than are found in a natural population. In the two reference human genomes (Celera and public), Craig Venter has two copies of CYP1A2 and the anonymous public genome is decient in CYP2D6 (CYP2D6*5 deletion allele, [7]). Some Drosophila populations have two copies of the Cyp6g1 gene [8]. An allele of this gene is responsible for the global spread of DDT and neonicotinoid insecticide resistance [9]. A more difcult type of polymorphism is copy number polymorphisms involving large (N100 kb) segments of genome. In the initial sequence of the malaria vector mosquito Anopheles gambiae, three allelic clusters of CYP genes differed in the number of genes they contained [10]. 2.2. Diversity at the CYP clan, family and subfamily level: CYP blooms As shown in Fig. 1, the number of P450 genes in arthropod species varies widely (36 in the human body louse Pediculus humanus to 180 in the Northern house mosquito Culex pipiens), but so far all the CYP genes can be assigned to one of four clans: CYP2, CYP3, CYP4 and the mitochondrial CYP clan [6]. Less than a dozen of these genes in insects, and maybe just a handful in arthropods are true orthologs or very close paralogs. Orthologs include the ve P450s involved in the biosynthesis of the steroid molting hormone (Fig. 2). In D. melanogaster these are products of the Halloween genes. Two belong to the CYP2 clan and three to the mitochondrial clan [11]. Additional orthologs include CYP15, a member of the CYP2 clan that stereospecically epoxidizes the precursor of juvenile hormone in insects [12] and CYP18, an enzyme involved in ecdysteroid inactivation. Compared to the human complement of CYP genes, arthropod genomes

apparently do not carry members of the CYP51 and CYP7/8/39 clans or of the CYP19, CYP20, CYP26 and CYP46 clans. Not apparent from Fig. 1, but conspicuous in all CYPome annotations, is the presence of one or more CYP subfamilies that appear both abundant and lineage-specic. This is shown in Fig. 3 for the honey bee CYPome that, despite a comparatively low (46) number of CYP genes has 15 members of the CYP6AS subfamily [13]. I would call this a CYP bloom, as a more evocative term than recent, phylogenetically independent proliferation of close paralogs or lineage specic gene family expansion [14]. Other examples of several such blooms in a diversity of species are the 15 CYP2C genes in the mouse, 12 CYP6A genes in the fruit y, 13 CYP6BQ genes in the red our beetle Tribolium castaneum, 19 CYP4AB genes in the jewel wasp Nasonia vitripennis. These CYP blooms are one of the most striking features of P450 evolution, and their origin will now be discussed in more detail.

Fig. 2. The molting hormone, 20 hydroxyecdysone, with the sites of hydroxylations of the P450s (encoded in Drosophila by the Halloween genes) that are conserved between insects and crustaceans. The exact function of the CYP307A genes is still unresolved. The sequence of oxidations from precursor cholesterol is also indicated, steps #3 to #5 are catalyzed by mitochondrial clan P450s.

Fig. 3. An example of CYP subfamily bloom: The CYP6AS subfamily in the CYPome of the honeybee. The deduced amino acid sequences of the 46 CYP genes of Apis mellifera [13] have been aligned and analyzed by Phyml. The CYP6AS are shown in red. Genes with orthologs or very close paralogs in most insect species are marked with an asterisk.

Please cite this article as: R. Feyereisen, Arthropod CYPomes illustrate the tempo and mode in P450 evolution, Biochim. Biophys. Acta (2010), doi:10.1016/j.bbapap.2010.06.012

R. Feyereisen / Biochimica et Biophysica Acta xxx (2010) xxxxxx

Fig. 4. Scheme of CYP family diversity through evolutionary time. The vertical axis represents time and the horizontal dimension is proportional to diversity (number of genes per family). CYP families are related to each other and have a distant common ancestor (root, bottom). The horizontal line of Tpresent represents the diversity of CYP genes among diverse families in any given species as obtained in genome projects. The horizontal line of Tpast represents the diversity of CYP genes of two lineages (e.g. human/ mouse; fruit y/mosquito) at the time of their divergence. The diversity of CYP genes was identical for the two lineages up to that time (below Tpast), then is specic for each lineage (above Tpast). The scheme shows birth of new families from existing ones, and extinction of families, or maintenance of families with few members. See text for details.

3. CYPome diversity: Past and present In the theoretical scheme shown in Fig. 4, the diversity of CYP families in a typical organism is represented over evolutionary time. In this scheme, many will recognize the typical way that the fossil record is illustrated. The vertical axis represents time, an arrow pointing up to the present, and the horizontal axis represents diversity, such as the prevalence of a certain type of fossil at a particular time in the past. Typically, each group of fossils or clade is represented by an oval-shaped diagram, with the bottom representing the rst and oldest fossils, and the tip representing the most recent representative before extinction. Gould et al. [15] observed that this graphical representation was sometimes used in elds other than paleontology, and felt that it would be useful in the study of pathways of change in general. Here, we can readily adopt what Gould et al. called a fruitful isomorphism, and represent diversity of the CYPome by such diversity diagrams, with each CYP family or clan represented by its own oval diagram. Of course, we do not have a historical or geological record to document past diversity of the P450 gene superfamily, but we can make several reasonable assumptions and deductions: 3.1. P450 birth and death Just like speciation events and extinctions in the case of species diversity, there is a process of gene birth and death for gene diversity. Many mechanisms can generate paralogous genes in the CYPome. These include tandem duplications including unequal crossing over of various lengths, transposable elementinduced non-allelic recombinations, chromosome or genome duplications (particularly polyploidy in plants) as well as retropositions. Gene duplication was well known in genetics before Susumu Ohno gave it prominence [16], but this process has taken great importance in the literature over the last 10 years with attempts to quantify rates of gene birth by duplication and gene death [17]. In their landmark paper Lynch and Conery, who notably excluded large gene families from their estimates, reported rates of 0.0023 duplications per gene per MY (million years) for D. melanogaster, with a half life of a gene duplicate of about 2.9 MY [17]. Hahn et al. [18] using data from gene families in the 12 Drosophila genomes calculated a rate of duplication of 0.0012/gene/MY. Similarly, Osada and Innan [19] in a detailed study of the

Drosophila genus taking into account possible gene conversions, found a duplication rate of 0.001/gene/MY. This rate of duplication thus appears quite robust and is comparable to the spontaneous mutation rate of nucleotide level, an observation that was certainly unexpected a decade ago. As a result, using Drosophila as an example, we can estimate the duplication rate in the P450 family (about 85 genes) to generate a new P450 every 5 to 12 MY. The dataset of the 12 species in the Drosophila genus allows a direct verication of this estimate, with multiple gains and losses over 60 MY. In a detailed study of Drosophila P450s, Chung et al. [20] found that only one third of the 53 genes they studied had a 1:1 ortholog in the 11 other Drosophila species. This gure is consistent with the rates of gene duplication estimated for the whole genome. The high rate of gene duplication is not specic to insects but is a general feature in eukaryotes and is generally considered to be much higher in vertebrates, so that birth rates are hovering about 0.01/gene/ MY [21]. It may be higher still in plants, where polyploidization is not uncommon. As mentioned in the Introduction, a comparison of mouse and human P450 genes shows that the number of genes has changed in the 90 MY since the divergence of their common mammalian ancestor, so that there are 45 more P450 genes in the mouse and that only 34 putatively orthologous pairs remain between the two species (from a total of 57 humans and 102 mice)[2]. Gene death rate is much higher than birth rate, with the most likely fate of a newly duplicated gene being its rapid loss by mutational degeneration to a pseudogene or by deletion. The death rate is generally estimated or modeled to follow an exponential decay, with a half-life of duplicated genes in the order of 38 MY. Evidence from whole genome duplications that have been dated show between 70% and 90% gene loss (for one of the copies) in the 6080 MY since genome duplication in plants and yeasts and a similar number for a more ancient duplication in teleost shes. Most stochastic birth and death models can explain the numbers of genes in gene families and they t the paradigm of neutral evolution because selection is not an explicit mechanism of these models [22]. 3.2. P450 phylogeny Diagrams of individual CYP families in Fig. 4 are related to each other by lines of descent. For instance, the insect CYP6 and CYP9 families shared a common ancestor, and so did the vertebrate CYP17 and CYP21 families, the CYP7 and CYP8 families and the CYP3 and CYP5 families. The sequences have diverged sufciently to be no longer recognized as members of the same family. The CYP1 and CYP3 families can each be traced back to early deuterostomes as shown convincingly by analysis of tunicate sequences [23,24]. When presentday diversity is represented by cladograms or phylograms, these are usually called phylogenetic trees, and they are best guesses generated by various algorithms (neighbor-joining, maximum likelihood, Bayesian method, maximum parsimony, etc.) from stacks of aligned sequences. However, these trees do not and cannot represent the true phylogenetic history of the gene family. Indeed, the nodes or branching points in the trees represent now extinct species for which fossils can be found and objectively described, as in species trees. Instead, in the gene trees each node represents an ancestral sequence that remains hypothetical and whose reconstruction is based not on the actual sequence of all its immediate descendants, but only on the sequences of those descendants that have escaped gene death and have made it to the present. Thus, because sampling of present sequence diversity does not include the CYP genes that have been lost, accuracy of the gene tree is biased by the present diversity. 3.3. Past diversity vs. present diversity As a consequence of common descent, the bottom part of each diagram will be identical for related taxa. Up until 420 MYA (million years ago) date of the last common ancestor of human and sh, the

Please cite this article as: R. Feyereisen, Arthropod CYPomes illustrate the tempo and mode in P450 evolution, Biochim. Biophys. Acta (2010), doi:10.1016/j.bbapap.2010.06.012

R. Feyereisen / Biochimica et Biophysica Acta xxx (2010) xxxxxx

diagram will be identical for both, but after this time (the top part), each diagram will be different because evolution of the CYP gene superfamily will be lineage- and species-specic. Similarly, the diagram will be identical for Drosophila and for Anopheles up until 250 MYA, and will diverge after that. Speciation events have thus sampled and split diversity of the P450 superfamily in the past. At the time of speciation, the complement of P450 genes in each species was virtually identical, (except perhaps for a polymorphic P450 gene that contributed to the speciation event). After that time, evolution went its own way in each branch. Therefore, present day P450 diversity is species-, genus-, family-, order-, etc. specic. The number and identity of common CYP ancestors is unknown, and in fact, unknowable, and only an estimate of the minimal number can be made, this estimate being itself biased by the difculty of recognizing true orthologs as one goes back further into the past. If a gene is lost in both species A and B, nothing in the descendants of species A and B that are present today can tell us that the gene had been present in the past. The half-life of a gene being smaller on average than the half-life of a species or genus, this is by no means an unlikely scenario. Stochastic models of gene birth and death can lead to estimates of gene family sizes in ancestral states [25] and with an assumption of unequal rates of gain and loss in different branches of the phylogeny these estimates become more imprecise as one moves deeper into the past. A few years ago, the presence of CYP51 as the only (then known) common P450 in plants, animals and fungi had been interpreted as showing that all eukaryotic P450s descended from CYP51 [26,27] With more genomes now available, it appears that CYP74-like P450 genes are shared between plants (mosses and angiosperms) and animals (the placozoan Trichoplax adhaerens, Cnidaria and Branchiostoma oridae a.k.a. amphioxus, a chordate) [28]. So it appears that the lineage to humans lost all CYP74s, just as the lineage to Drosophila lost CYP51. The number of P450 genes in the hypothetical LECA (last eukaryotic common ancestor) genome will remain a mystery. Moreover, all present day P450s are derived from a common ancestor, but no present day P450, neither CYP51 nor CYP74, can be described as being most representative of the ancestral form of P450 and both in their own way are highly evolved enzymes. Can CYP51 or CYP74 be considered living fossil genes in the same way that the present day horseshoe crab Limulus polyphemus (Xiphosura) is virtually identical, morphologically, to organisms found as fossils in the Triassic? Evolutionary constraints, or the conservation of structural and functional aspects, are evident for the CYP51 and CYP74 indicating strong purifying selection (elimination of nonsynonymous mutations) over hundreds of millions of years. It is generally admitted that single copy genes with essential functions are more stable. So sterol auxotrophy and the loss of CYP51 in the arthropod lineage must have resulted from a previous selection of efcient mechanism of sterol scavenging from food that destabilized CYP51 in that lineage. Nonetheless, even the stable

CYP51 and CYP74 genes are potential starting points for diversication. The amphioxus has 20 CYP74-related genes, of which one at least encodes a new epoxyalcohol synthase activity instead of the allene oxide synthase or hydroperoxide lyase activity of other CYP74s [28]. Some Poaceae have additional CYP51 genes, nine in rice and two in oats, where they are involved in the synthesis of triterpenoid defense compounds [29]. 3.4. Asymmetry in P450 diversity Another, and less obvious observation can be made from the shape of the diagrams in the scheme (Fig. 4). Gould et al. [15] noted that the diagrams for fossil diversity were asymmetric in the time scale, i.e. they were bottom-heavy. They further suggested that this temporal asymmetry (greater diversity at the beginning of a clade's history) was typical in studies on innovation. Clades diversify rapidly in ephemeral times of unusual opportunity and peter out in a world at or closer to equilibrium [15]. I would argue that such asymmetry also occurs in the evolution of CYP clade diversity at the family or subfamily level. More specically, in every genome studied to date, there are just a few CYP families with many members and many CYP families with one or a few members. This is shown by the frequency distribution of numbers of genes per CYP family in Fig. 5. Although we are deprived of the time scale, we see in Fig. 4 the present time snapshot of diversity (i.e. line Tpresent of Fig. 4) where abundant families are at the beginning of their evolutionary history, whereas families with few or just one member represent either pioneers or lonely survivors. It was recognized early on that the occurrence of gene families of different sizes in complete genomes followed a power-law behavior, with the frequency of large families being very small and the frequency of smaller families being much larger [30]. This power-law behavior was conrmed for a wide variety of genomic characteristics, such as the frequency of protein folds or even short DNA sequences [31]. It is perhaps not surprising then that CYP families representing P450 sequences grouped by percentage identities (more or less the 40% cutoff rule) would also follow this behavior. When the data in Fig. 5 are plotted in log vs. log, the size of each CYP family versus their frequency shows a clear power-law distribution. This is true irrespective of the total number of genes that is quite variable or the type of organism (animals, fungi, plants, although the pattern for moss is dwarfed by the high number of genes in rice). The Arabidopsis thaliana CYPome is the only biological example of this power-law behavior of gene families in a theoretical treatment of the birth and death model [32]. There is one conspicuous exception to the frequency distribution of CYP families shown in Fig. 5. In Neurospora crassa, there are 41 CYP genes in 39 families with only two families that are represented by two members each. N. crassa is therefore a clear exception to the pattern of CYP gene duplication and divergence seen in other organisms. The reason for this is most likely the process of repeat-induced point mutation (aptly

Fig. 5. Frequency distribution of the number of genes per CYP family in different genomes (families ranked by increasing number of genes to distinguish species more easily). In each case, most families have low numbers of genes and very few families are numerous. The exceptional case of N. crassa is explained in the text.

Please cite this article as: R. Feyereisen, Arthropod CYPomes illustrate the tempo and mode in P450 evolution, Biochim. Biophys. Acta (2010), doi:10.1016/j.bbapap.2010.06.012

R. Feyereisen / Biochimica et Biophysica Acta xxx (2010) xxxxxx

named RIP) that detects and inactivates repeated DNA by accumulating C:G to T:A mutations and by silencing through methylation [33]. Although considered a defense against the spread of transposable elements, RIP also affects recently duplicated genes and the paucity of duplicated genes in the N. crassa genome is well documented. It is hypothesized that the RIP mechanism evolved after the divergence of the Neurospora and Magnaporthe lineages (there are three times as many genes in Magnaporthe grisea). There are therefore two possible explanations for the diversity of CYP genes in this fungus. Most likely, the 39 families represent the minimum number of CYP families in the Neurospora/Maganaporthe ancestor (i.e. about 200 MYA). Less likely, the current CYP genes are duplicated genes in the Neurospora lineage that diverged so rapidly to escape RIP that they are now beyond the family cutoff of sequence identity. On average, a duplicated gene has twice the probability of being duplicated again than a singleton and indeed, as noted in previous studies, as the size of the family grows, the probability of further growth by duplication is increased [31,34]. Therefore blooms as seen for CYP families or subfamilies can be the result of an initial trigger and do not require the long term maintenance of this trigger, as the bloom has been unleashed. The observation of CYP families with small gene numbers between species, and families with larger, variable numbers between species is well known in the P450 community. It is generally explained by a dichotomy between the few P450s involved in fundamental and specic, essential physiological functions and the many other P450s involved in drug metabolism (mammals), or environmental response (insects), sometimes at the evolutionary scale (plant and fungal synthesis of defensive metabolites). The trigger for selective expansion into blooms is therefore often assumed to be an adaptation to the environment favored by natural selection. However, adaptation is not a necessary explanation for the maintenance of the blooming behavior of CYP subfamilies because of the self-sustained increase in numbers due to successive duplications as noted above. Moreover, population genetic considerations indicate that increases in gene numbers may be a passive response to reduced population size that allows duplicate preservation by subfunctionalization (see below), rather than being driven by adaptive processes [35]. If the effective population size is large, xation of the duplication, as of any other mutation, is less likely. The trigger for P450 blooms may therefore be associated with speciation events, when founder effects cause genetic bottlenecks. Thus, whether multiple P450s can be adaptive (in that they may benet tness) is probably beyond doubt, but it is much more doubtful that duplications of P450 genes are adaptations (in that there was selection for duplicating P450 genes).

4. Fate of duplicated genes 4.1. Neofunctionalization and subfunctionalization The fate of duplicated genes has been extensively studied [21]. Although one would assume that duplicated genes are initially identical and redundant, this is rarely exactly true (size of the duplicated segment, position effect, allelic sampling) [36] so that the two members of a duplicated pair do not have formally the same probability to have the same fate. However, it is statistically very difcult to detect asymmetric rates of duplicate gene divergence [36]. Gene death (nonfunctionalization) of one copy is the most likely outcome of gene duplication as noted above, because both copies are initially almost redundant and this allows mutational silencing of one copy without tness cost. The number of recognizable CYP pseudogenes is highly variable, not just because there is a time-dependent accumulation of mutations, but also because some species eliminate junk DNA more efciently. There are proportionally many more CYP pseudogenes in the mouse than in Drosophila. Pseudogenes are well annotated in both species, though in many other genomes the pseudogenes have not (yet?) been comprehensively annotated.

The proposition that newly duplicated genes are almost redundant is shown by the example of single or multiple tandem duplications of the human CYP2D6 gene that lead to a high metabolizer phenotype [4]. In this case it is clear that more copies of the gene means more active enzyme, and that dosage imbalance is not selected against. This case illustrates well the difference between the short-term fate of a duplication which is a copy number polymorphism in human populations and its long-term survival and xation which may or may not occur. In Drosophila, copy number polymorphism affects 2% of the genome [8]. The virtual sequence identity of duplicated genes is initially favorable to gene conversion for short tandem duplications, until sequence divergence increases [37]. There are relatively few examples of CYP gene conversion events (rat CYP2D, [38], human CYP2D6 [39]), but more cases will undoubtedly be uncovered as detailed studies of closely related genes are undertaken, despite the difculty of distinguishing functional conservation from gene conversion [23]. Mutations in either coding regions or non-transcribed regulatory regions can tip the balance between life and death of the duplicate, with neofunctionalization and subfunctionalization as the recognized alternatives to nonfunctionalization. Neofunctionalization is understood as the acquisition of a new function not shared by the parent single gene. Subfunctionalization is understood as the reciprocal loss of a portion of the original gene's function, so that the two duplicates now partition between them the functions of the original gene [21]. For P450s the concepts of neo- and subfunctionalization can be understood at the transcription level and/or at the level of catalytic competence. Precise knowledge of the tissue, developmental, and induction pattern of expression on one hand, and of catalytic competence of the P450 on the other hand is rarely available for recently duplicated CYP genes and for their parent gene. Reliance on the comparison between closely related species is often the only available tool. Comparison of duplicated genes with their pro-orthologs in other species (i.e. 2 to 1 comparisons) is made very difcult in the P450 superfamily, because in few cases can these pro-orthologs be identied with a reasonable degree of certainty. Comparison of very closely related species should shed some light on this problem. There are many examples of subfunctionalization at the transcriptional level for P450 genes. In the case of zebrash CYP19 aromatase, there are two copies of the gene following a chromosomal duplication event. One copy is now strictly expressed in the brain, whereas the other is expressed predominantly in the ovary, a probable case of subfunctionalization [40]. Restricted expression of Cyp6g2 in the corpus allatum portion of the Drosophila ring gland [20] is probably a case of transcriptional neofunctionalization, because its tandem duplicate Cyp6g1 is more widely expressed in other tissues. If the parent gene had a broad expression pattern, then the new pattern for Cyp6g2 and its loss of ancestral pattern would qualify for neofunctionalization. Indeed loss of the ancestral function is a corollary of neofunctionalization because one of the two paralogs needs to remain unchanged in this model. The case for neofunctionalization of P450 genes at the level of catalytic competence is somewhat more difcult to make, because it is well known that P450 substrate specicity can be affected by a single amino acid substitution. Semantic distinctions take over and it is therefore difcult to decide what constitutes tweaking a low preexisting activity into prominence (is this neofunctionalization or not?) from generating a novel catalytic activity ex nihilo (can this formally be proven?). 4.2. Blooms and clusters The processes of sequential tandem gene duplication events can lead to large clusters of CYP genes on chromosomes and these are often striking landmarks of the CYPome [6,13,41]. The comparison of syntenic regions of the genome in closely related species reveals how rapidly these clusters can evolve. The comparison of the CYP2A/T cluster in mouse and human genomes is a striking example of the

Please cite this article as: R. Feyereisen, Arthropod CYPomes illustrate the tempo and mode in P450 evolution, Biochim. Biophys. Acta (2010), doi:10.1016/j.bbapap.2010.06.012

R. Feyereisen / Biochimica et Biophysica Acta xxx (2010) xxxxxx

different evolutionary path taken after the split from a common ancestor [42]. The CYP9A cluster in Lepidoptera is another example where three species, one distant (Bombyx mori) and two more closely related (Helicoverpa armigera and Spodoptera frugiperda) have been carefully analyzed [43]. At least eight gene duplications have taken place in about 80 MY, to yield 4, 5 and 9 genes with only 3 ortholog pairs still recognizable in the two more closely (20 MY) related species (Fig. 6). Clusters favor the maintenance of duplicated CYP genes when one of the paralogs in the cluster becomes subject to natural selection even when the others are not, because of the hitchhiking effect. Indeed, selection of a favorable allele in a cluster will lead to an equivalent increase in frequency of the neighboring alleles until sufcient recombination events have separated the selected paralog from its neighbors. Some clusters of duplicated genes can be maintained over very long times without being spread out by recombination. The head to tail pair of close paralogs CYP306A1 and CYP18A1 is conserved as a cluster in all insects studied so far (except in A. gambiae that has lost the CYP18A1 gene), and is even found in the crustacean D. pulex [44], thus dating this cluster to well over 500 MY. Tandem and segmental duplications, by various mechanisms, are not the only way that CYP gene number can increase. Retroposition is another mechanism that rapidly disperses new genes, but it is particular in that the original gene needs to be transcribed in the germ line and that the resulting new gene is intronless. A spectacular example of retroposition is provided by the clustered CYP98A8 and A9 genes of A. thaliana [45]. These genes are oddly distant on CYP98A phylogenies, and result from a single retroposition event followed by tandem duplication of these intronless paralogs. Positive (adaptive) selection was shown for several residues that explain both the rapid evolution of the genes and their neofunctionalization into tricoumaroylspermidine hydroxylases, presumably from an ancestral p-coumaroyl shikimate hydroxylase. Their expression in anthers is thought to modify the pollen coat and contribute to fertilization barriers in this plant lineage [45]. Two independent retroposition events in the CYP51 gene have occurred in the human lineage, one about 9.5 MYA and one about 11.7 MYA [46]. In both cases, the retrotranscribed processed gene has

been inactivated as a recognizable pseudogene. In the Drosophila genus, two sequential retropositions of CYP307A paralogs have been hypothesized [47]. The rst event, over 60 MYA, led to the pair CYP307A2/(new, intronless) CYP307A1. The second, about 35 MYA, led to the pair CYP307A2/(new, intronless) CYP307A3 in the lineage leading to the mojavensis, grimshawi and virilis species of Drosophila. In this lineage, the CYP307A1 copy was lost at about the same time. Subfunctionalization at the transcriptional level is probable for these genes, each having a specic tissue and developmental expression pattern [47]. Interestingly, CYP51 and CYP307 are both enzymes that are the rst P450s in the organisms' sterol modication, and at least for CYP51, this contributes to the synthesis of signaling sterols in gametogenesis. 4.3. Are blooms restricted to some CYP subfamilies? As mentioned above, CYP families with small gene numbers between species, and families with larger, variable numbers between species are well known in the P450 community, and ortholog pairs are relatively easy to nd in small CYP families and subfamilies. If mechanisms of gene duplication, birth and death are stochastic and t the paradigm of neutral evolution, then why don't we observe blooms randomly in all CYP subfamilies? What does the dichotomy between endogenous function and environmental response mean and can it be explained? A detailed study of 10 vertebrate species conrmed the well known observation that rapid birthdeath evolution is characteristic of xenobiotic-metabolizing P450s, whereas those P450s with endogenous functions are more stable, with fewer recent duplications [48]. However, this dichotomy between unstable and stable genes does not explain why stable genes would remain stable or where unstable genes come from. The stable CYP17 and CYP21 families have a common ancestor with the unstable CYP2 family. Within the CYP2 family, some genes are stable, some unstable. Positive selection in amino acid sequence is said to be restricted to unstable genes. A detailed study of mammalian CYP paralogs showed a good match between Gotoh's SRSs [49] and the localization of sites under positive (adaptive) selection, particularly rabbit and rat CYP2Cs (7 and 4 paralogs), human CYP3As (4 paralogs) and rat CYP2Ds (6 paralogs) [50].

Fig. 6. The syntenic CYP9A cluster in three lepidopteran genomes. The CYP9A genes in three species are shown in red and numbered in their correct orientation and order, but not relative distance. From top to bottom: the noctuids S. frugiperda (Sf) and H. armigera (Ha) and the silkworm B. mori (Bm). The phylogenetic tree based on alignments of the CYP proteins is superimposed with its correct topology, but with branch lengths modied for clarity of the gure. Filled circles indicate gene duplication events; Filled squares indicate the B. mori/noctuid split (i.e. ancient speciation rather than duplication event) and the lled triangle indicates the S. frugiperda/H. armigera split. The sequence of events for the CYP9A15, 26, 27 genes is unresolved. The relative orientation of the genes indicates at least two inversions in addition to the duplication events. ADH and BmETS are syntenic non P450 genes (reproduced from [43] with permission).

Please cite this article as: R. Feyereisen, Arthropod CYPomes illustrate the tempo and mode in P450 evolution, Biochim. Biophys. Acta (2010), doi:10.1016/j.bbapap.2010.06.012

R. Feyereisen / Biochimica et Biophysica Acta xxx (2010) xxxxxx

This was interpreted as showing an increased capacity of these paralogs to bind a variety of substrates. Conversely, adaptive evolution of rabbit CYP4As (4 paralogs) was not concentrated on SRS residues. No evidence of positive selection was observed for rat CYP3As, CYP2Bs or human CYP2Cs [50]. In the case of ortholog comparisons, positive selection was observed from a few sites in CYP1A2, 1B1 and 2E1, but not 1A1 or 1C1 [50]. The CYP1 family is generally considered stable with 34 members in organisms as distant as sh and mammals [24]. The relatively low number of sites under positive selection, and the high number of available sites and genes studied would indicate that positive (adaptive) selection is not the major driver in the diversity of CYP sequences, perhaps against common assumptions. In Papilio butteries, there is evidence for purifying selection (opposite of positive selection) in SRS1 of CYP6B enzymes to maintain the ability to metabolize host plant furanocoumarins [51,52]. Yet the CYP6B subfamily is notoriously unstable. Birthdeath models that account for the fact that some families grow faster than others must incorporate interactions between paralogs, and such interactions have be ascribed to selection [22]. Strong purifying selection of slowly evolving genes favoring xation of duplicates is one possible explanation [22], though the opposite explanation has been advanced as well [53]. More work is needed to fully understand the dynamics of blooms. Lineage-specic expansions or blooms have been suggested to provide the raw material for generating the diversity required to respond to environmental challenges and may also point to the existence of an as yet undiscovered diversity of small molecule metabolites in various lineages [14]. Both explanations may hold for P450s, but both would need to be rigorously tested. The blooming behavior of P450 families probably changes over time and so it is unevenly distributed between lineages. Moreover, genome-wide rates of evolution differ (fast evolving Drosophila and Caenorhabditis elegans being classic examples) and such differences must also cascade down to the level of CYPome evolution. Mitochondrial clan P450s are an example where some appear stable in vertebrates, but are unstable in insects. 5. Mitochondrial P450s It is somewhat paradoxical that during the early historical development of P450 research until the late 1970s, the number of mammalian mitochondrial P450s involved in specic physiological functions was greater than that of microsomal xenobiotic metabolizing P450s (the 3MC-inducible and phenobarbital-inducible types). This view of P450 diversity has changed dramatically, but the idea that mitochondrial P450s are restricted to specic physiological or endocrine functions is still widely held. However, not all mitochondrial clan P450s are involved in essential physiological functions, and not all mitochondrial P450s belong to the mitochondrial CYP clan. In fungi, the nitric oxide reductase P450nor (CYP55) is found either as a cytoplasmic enzyme, or as a mitochondrial enzyme. In Fusarium oxysporum the two forms are products of the same gene, with alternate N-terminal processing, whereas in Cylindrocarpon tonkinense there are two CYP55 genes, CYP55A2 and A3, one coding the mitochondrial form, the other coding the cytoplasmic form [54]. The origin of CYP55 is unclear, but may have resulted from horizontal gene transfer from bacteria, early during evolution [55]. 5.1. Diversity of the mitochondrial CYP clan: Arthropods vs. vertebrates The mitochondrial clan is of particular interest because in vertebrates it comprises only P450 involved in essential physiological functions. Only three families are represented, CYP11, CYP24 and CYP27 and the members of all three families metabolize sterols, steroids or secosteroids (vitamin D). Human CYP27C1 is an orphan P450 of unknown function [56]. In contrast, there are to date over a dozen named families of mitochondrial clan P450s in arthropods. Insects have between 6 and 12 mitochondrial P450s, of which 3 are the orthologs of the C22, C2 and C20 hydroxylases of the ecdysteroid

pathway [11] (see Fig. 2). Interestingly, some insect mitochondrial P450s are typical xenobiotic metabolizing enzymes, and this is a clear difference between vertebrates and arthropods. Indeed, evidence for xenobiotic metabolism by mitochondrial clan P450 in vertebrates is scarce [57]. The evidence that insect mitochondrial P450 of the CYP12 family are involved in xenobiotic metabolism is threefold: First, heterologously expressed CYP12A1 from the house y metabolizes pesticides such as diazinon and other xenobiotics, but not ecdysteroids. This gene is constitutively overexpressed in strains that are resistant to diazinon [58]. Second, overexpression of the CYP12A4 gene in some natural populations from D. melanogaster is responsible for lufenuron resistance in those ies [59]. Third, Drosophila CYP12D1 is regularly observed to be overexpressed in resistant strains or to be induced by xenobiotics [60]. It confers resistance to DDT or dicyclanil when overexpressed in transgenic ies [61]. Although evidence from ies is currently limited to these three CYP12 family P450s, it is probable that generalization to the other P450s that are not 1:1 orthologs in various arthropod lineages can be made safely. Also, evidence for the mitochondrial localization of CYP12A1 is incontrovertible [58] and can also be generalized to other members of this family in insects. The comparison of the rst two complete insect CYPomes, the fruit y and mosquito [10], showed clearly that the mitochondrial clan P450s behaved differently in a phylogenetic analysis with ve pairs of clear orthologs (three Halloween genes and the highly conserved CYP301A1 and CYP49A1) and the remainder 10 genes distributed as species-specic paralogs. Moreover, the CYP12F1 gene of A. gambiae is one of few CYP genes seen to be constitutively overexpressed in a DDT-resistant strain (ZAN/U) and in a permethrin-resistant strain (RSP) [62]. Mitochondrial clan P450s are also found in other animal groups. The cephalochordate B. oridae (a.k.a. amphioxus) may have up to 28 mitochondrial clan P450s. In molluscs, CYP10 of the pond snail Lymnaea stagnalis [63] is a long standing member of the mitochondrial clan, and its selective expression in the female dorsal bodies that produce a gonadotropic hormone is suggestive of a physiological role. Molluscs have more mitochondrial clan P450s however, with at least seven mitochondrial clan P450s in Lottia gigantea (unpublished results). The nematode C. elegans has a single P450 belonging to the mitochondrial clan, CYP44A1. Its gene has no visible, developmental phenotype when disrupted [64], nor is it inducible by any of 18 xenobiotics [65]. Its function is unknown. In the basal metazoa, the placozoan Trichoplax adherens has apparently 21 mitochondrial clan P450s of unknown function. 5.2. The origins of the mitochondrial P450 clan The low number of available sequences and the limited sampling of diversity in the late 1980s led to erroneous conclusions on the origin of mitochondrial P450s based more on analogy than phylogeny [66]. Mitochondrial P450s have a ferredoxin (adrenodoxin, see Ewen et al., this issue) and adrenodoxin reductase (a ferredoxin NADP+reductase avoprotein) as their redox partners. Most bacterial P450s have a structurally analogous system, then best represented by putidaredoxin and putidaredoxin reductase in the P450cam system. Despite the fact that the ferredoxins belong to the same gene family (cd00159, fer2 superfamily), the reductases are more divergent (adrenodoxin reductase PLN02852 and putidaredoxin reductase PRK09754). The assembly of the three proteins into an analogous functional monooxygenase system is now recognized as a homoplasy (convergent similarity) between bacteria and mitochondria, and not a synapomorphy (evidence of common descent). There are now at least 10 known classes of electron transfer pathways to P450 enzymes involving different redox partners and domain fusion [67], so the convergent similarity of mitochondrial and some bacterial P450 systems is no longer a temptation to make evolutionary assumptions. Currently, sequence analyses show that the CYP2 clan is most related to the mitochondrial CYP clan, and it is therefore probable that

Please cite this article as: R. Feyereisen, Arthropod CYPomes illustrate the tempo and mode in P450 evolution, Biochim. Biophys. Acta (2010), doi:10.1016/j.bbapap.2010.06.012

R. Feyereisen / Biochimica et Biophysica Acta xxx (2010) xxxxxx

mitochondrial clan P450s originated from a CYP2 clan ancestor. It is probable that this occurred by a mistargeting of a microsomal P450 [68]. Even though there is no NADPH cytochrome P450 reductase inside mitochondria, a microsomal P450 may function with other redox partners [69,70] so that some degree of catalytic functionality would be maintained for the mistargeted P450. In one possible mistargeting scenario, an exon containing the typical membrane spanning, hydrophobic N-terminal is lost or replaced by an exon from a mitochondrially targeted protein encoded by a neighboring gene. Another scenario would be an intermediate state, wherein the encoded P450 can be both microsomal and mitochondrial, as suggested by observations from Avadhani's group [71]. Indeed this situation is presently seen with a number of mammalian P450s, of the CYP1, 2 and 3 families that can be found in mitochondria where they have a longer half life than on the ER. Several mechanisms for this dual localization have been proposed, all involving post-translational modications. These include endoprotease cleavage of an N-terminal sequence that reveals a cryptic mitochondrial targeting sequence (CYP1A1/2), or internal PKA phosphorylation that unmasks such a sequence (CYP2A5, 2B1, 2C11). Other microsomal P450 are also directed to mitochondria by unclear mechanisms [7274]. It is easy to imagine how mutations in a CYP gene subject to such dual localization could lead to a loss of the microsomal targeting while retaining the mitochondrial targeting. Further mutations may then have optimized the interaction of the P450 with preexisting mitochondrial redox partners, and three sites have been identied that t that description [75]. Whether microsomal P450s from other organisms are also found in mitochondria by similar processes is currently unknown. It is also currently unknown whether members of the mitochondrial CYP clan can under some circumstances be mistargeted to the endoplasmic reticulum. References
[1] D.C. Lamb, T. Skaug, H.L. Song, C.J. Jackson, L.M. Podust, M.R. Waterman, D.B. Kell, D.E. Kelly, S.L. Kelly, The cytochrome P450 complement (CYPome) of Streptomyces coelicolor A3(2), J Biol Chem 277 (2002) 2400024005. [2] D.R. Nelson, D.C. Zeldin, S.M. Hoffman, L.J. Maltais, H.M. Wain, D.W. Nebert, Comparison of cytochrome P450 (CYP) genes from the mouse and human genomes, including nomenclature recommendations for genes, pseudogenes and alternative-splice variants, Pharmacogenetics 14 (2004) 118. [3] D. Grimaldi, M.S. Engel, Evolution of the Insects, Cambridge University Press, Cambridge, 2005. [4] M. Ingelman-Sundberg, Pharmacogenetics of cytochrome P450 and its applications in drug therapy: the past, present and future, Trends Pharmacol Sci 25 (2004) 193200. [5] R. Feyereisen, Evolution of insect P450, Biochem Soc Trans 34 (2006) 12521255. [6] N. Tijet, C. Helvig, R. Feyereisen, The cytochrome P450 gene superfamily in Drosophila melanogaster: annotation, intronexon organization and phylogeny, Gene 262 (2001) 189198. [7] J.R. Idle, J. Corchero, F.J. Gonzalez, Medical implications of HGP's sequence of chromosome 22, Lancet 355 (2000) 319. [8] J.J. Emerson, M. Cardoso-Moreira, J.O. Borevitz, M. Long, Natural selection shapes genome-wide patterns of copy-number polymorphism in Drosophila melanogaster, Science 320 (2008) 16291631. [9] P.J. Daborn, J.L. Yen, M.R. Bogwitz, G. Le Goff, E. Feil, S. Jeffers, N. Tijet, T. Perry, D. Heckel, P. Batterham, R. Feyereisen, T.G. Wilson, R.H. ffrench-Constant, A single P450 allele associated with insecticide resistance in Drosophila, Science 297 (2002) 22532256. [10] H. Ranson, C. Claudianos, F. Ortelli, C. Abgrall, J. Hemingway, M.V. Sharakhova, M.F. Unger, F.H. Collins, R. Feyereisen, Evolution of supergene families associated with insecticide resistance, Science 298 (2002) 179181. [11] K.F. Rewitz, M.B. O'Connor, L.I. Gilbert, Molecular evolution of the insect Halloween family of cytochrome P450s: phylogeny, gene organization and functional conservation, Insect Biochem Mol Biol 37 (2007) 741753. [12] C. Helvig, J.F. Koener, G.C. Unnithan, R. Feyereisen, CYP15A1, the cytochrome P450 that catalyzes epoxidation of methyl farnesoate to Juvenile Hormone III in cockroach corpora allata, Proc Natl Acad Sci USA 101 (2004) 40244029. [13] C. Claudianos, H. Ranson, R.M. Johnson, S. Biswas, M.A. Schuler, M.R. Berenbaum, R. Feyereisen, J.G. Oakeshott, A decit of detoxication enzymes: pesticide sensitivity and environmental response in the honeybee, Insect Mol Biol 15 (2006) 615636. [14] O. Lespinet, Y.I. Wolf, E.V. Koonin, L. Aravind, The role of lineage-specic gene family expansion in the evolution of eukaryotes, Genome Res 12 (2002) 10481059. [15] S.J. Gould, N.L. Gilinsky, R.Z. German, Asymmetry of lineages and the direction of evolutionary time, Science 236 (1987) 14371441. [16] J.S. Taylor, J. Raes, Small-scale gene duplications, in: T.R. Gregory (Ed.), The Evolution of the Genome, Elsevier, Inc., 2005, pp. 289327.

[17] M. Lynch, J.S. Conery, The evolutionary fate and consequences of duplicate genes, Science 290 (2000) 11511155. [18] M.W. Hahn, M.V. Han, S.G. Han, Gene family evolution across 12 Drosophila genomes, PLoS Genet 3 (2007) e197. [19] N. Osada, H. Innan, Duplication and gene conversion in the Drosophila melanogaster genome, PLoS Genet 4 (2008) e1000305. [20] H. Chung, T. Sztal, S. Pasricha, M. Sridhar, P. Batterham, P.J. Daborn, Characterization of Drosophila melanogaster cytochrome P450 genes, Proc Natl Acad Sci USA 106 (2009) 57315736. [21] J. Zhang, Evolution by gene duplication: an update, Trends Ecol Evol 18 (2003) 292298. [22] G.P. Karev, Y.I. Wolf, F.S. Berezovskaya, E.V. Koonin, Gene family evolution: an indepth theoretical and simulation analysis of non-linear birthdeath-innovation models, BMC Evol Biol 4 (2004) 32. [23] T. Verslycke, J.V. Goldstone, J.J. Stegeman, Isolation and phylogeny of novel cytochrome P450 genes from tunicates (Ciona spp.): a CYP3 line in early deuterostomes? Mol Phylogenet Evol 40 (2006) 760771. [24] J.V. Goldstone, H.M. Goldstone, A.M. Morrison, A. Tarrant, S.E. Kern, B.R. Woodin, J.J. Stegeman, Cytochrome P450 1 genes in early deuterostomes (tunicates and sea urchins) and vertebrates (chicken and frog): origin and diversication of the CYP1 gene family, Mol Biol Evol 24 (2007) 26192631. [25] T. De Bie, N. Cristianini, J.P. Demuth, M.W. Hahn, CAFE: a computational tool for the study of gene family evolution, Bioinformatics 22 (2006) 12691271. [26] D.R. Nelson, Cytochrome P450 and the individuality of species, Arch Biochem Biophys 369 (1999) 110. [27] Y. Yoshida, M. Noshiro, Y. Aoyama, T. Kawamoto, T. Horiuchi, O. Gotoh, Structural and evolutionary studies on sterol 14-demethylase P450 (CYP51), the most conserved P450 monooxygenase: II. Evolutionary analysis of protein and gene structures Tokyo, J Biochem 122 (1997) 11221128. [28] D.S. Lee, P. Nioche, M. Hamberg, C.S. Raman, Structural insights into the evolutionary paths of oxylipin biosynthetic enzymes, Nature 455 (2008) 363368. [29] X. Qi, S. Bakht, B. Qin, M. Leggett, A. Hemmings, F. Mellon, J. Eagles, D. WerckReichhart, H. Schaller, A. Lesot, R. Melton, A. Osbourn, A different function for a member of an ancient and highly conserved cytochrome P450 family: from essential sterols to plant defense, Proc Natl Acad Sci USA 103 (2006) 1884818853. [30] M.A. Huynen, E. van Nimwegen, The frequency distribution of gene family sizes in complete genomes, Mol Biol Evol 15 (1998) 583589. [31] N.M. Luscombe, J. Qian, Z. Zhang, T. Johnson, M. Gerstein, The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties, Genome Biol 3 (2002) RESEARCH0040. [32] W.J. Reed, B.D. Hughes, A model explaining the size distribution of gene and protein families, Math Biosci 189 (2004) 97102. [33] J.E. Galagan, E.U. Selker, RIP: the evolutionary cost of genome defense, Trends Genet 20 (2004) 417423. [34] M.W. Hahn, T. De Bie, J.E. Stajich, C. Nguyen, N. Cristianini, Estimating the tempo and mode of gene family evolution from comparative genomic data, Genome Res 15 (2005) 11531160. [35] M. Lynch, J.S. Conery, The origins of genome complexity, Science 302 (2003) 14011404. [36] M. Lynch, V. Katju, The altered evolutionary trajectories of gene duplicates, Trends Genet 20 (2004) 544549. [37] J.B. Walsh, Sequence-dependent gene conversion: can duplicated genes diverge fast enough to escape conversion? Genetics 117 (1987) 543557. [38] E. Matsunaga, M. Umeno, F.J. Gonzalez, The rat P450 IID subfamily: complete sequences of four closely linked genes and evidence that gene conversions maintained sequence homogeneity at the heme-binding region of the cytochrome P450 active site, J Mol Evol 30 (1990) 155169. [39] M.H. Heim, U.A. Meyer, Evolution of a highly polymorphic human cytochrome P450 gene cluster: CYP2D6, Genomics 14 (1992) 4958. [40] E.F. Chiang, Y.L. Yan, Y. Guiguen, J. Postlethwait, B.C. Chung, Two Cyp19 (P450 aromatase) genes on duplicated zebrash chromosomes are expressed in ovary or brain, Mol Biol Evol 18 (2001) 542550. [41] S.M. Paquette, S. Bak, R. Feyereisen, Intronexon organization and phylogeny in a large superfamily, the paralogous cytochrome P450 genes of Arabidopsis thaliana, DNA Cell Biol 19 (2000) 307317. [42] H. Wang, K.M. Donley, D.S. Keeney, S.M. Hoffman, Organization and evolution of the Cyp2 gene cluster on mouse chromosome 7, and comparison with the syntenic human cluster, Environ Health Perspect 111 (2003) 18351842. [43] E. d'Alencon, H. Sezutsu, F. Legeai, E. Permal, S. Bernard-Samain, S. Gimenez, C. Gagneur, F. Cousserans, M. Shimomura, A. Brun-Barale, T. Flutre, A. Couloux, P. East, K. Gordon, K. Mita, H. Quesneville, P. Fournier, R. Feyereisen, Extensive synteny conservation of holocentric chromosomes in Lepidoptera despite high rates of local genome rearrangements, Proc Natl Acad Sci USA 107 (2010) 76807685. [44] K.F. Rewitz, L.I. Gilbert, Daphnia Halloween genes that encode cytochrome P450s mediating the synthesis of the arthropod molting hormone: evolutionary implications, BMC Evol Biol 8 (2008) 60. [45] M. Matsuno, V. Compagnon, G.A. Schoch, M. Schmitt, D. Debayle, J.E. Bassard, B. Pollet, A. Hehn, D. Heintz, P. Ullmann, C. Lapierre, F. Bernier, J. Ehlting, D. Werck-Reichhart, Evolution of a novel phenolic pathway for pollen development, Science 325 (2009) 16881692. [46] D. Rozman, M. Stromstedt, M.R. Waterman, The three human cytochrome P450 lanosterol 14 alpha-demethylase (CYP51) genes reside on chromosomes 3, 7, and 13: structure of the two retrotransposed pseudogenes, association with a line-1 element, and evolution of the human CYP51 family, Arch Biochem Biophys 333 (1996) 466474.

Please cite this article as: R. Feyereisen, Arthropod CYPomes illustrate the tempo and mode in P450 evolution, Biochim. Biophys. Acta (2010), doi:10.1016/j.bbapap.2010.06.012

10

R. Feyereisen / Biochimica et Biophysica Acta xxx (2010) xxxxxx [62] J.P. David, C. Strode, J. Vontas, D. Nikou, A. Vaughan, P.M. Pignatelli, C. Louis, J. Hemingway, H. Ranson, The Anopheles gambiae detoxication chip: a highly specic microarray to study metabolic-based insecticide resistance in malaria vectors, Proc Natl Acad Sci USA 102 (2005) 40804084. [63] Y. Teunissen, W.P. Geraerts, H. van Heerikhuizen, R.J. Planta, J. Joosse, Molecular cloning of a cDNA encoding a member of a novel cytochrome P450 family in the mollusc Lymnaea stagnalis, J Biochem (Tokyo) 112 (1992) 249252. [64] R.S. Kamath, A.G. Fraser, Y. Dong, G. Poulin, R. Durbin, M. Gotta, A. Kanapin, N. Le Bot, S. Moreno, M. Sohrmann, D.P. Welchman, P. Zipperlen, J. Ahringer, Systematic functional analysis of the Caenorhabditis elegans genome using RNAi, Nature 421 (2003) 231237. [65] R. Menzel, T. Bogaert, R. Achazi, A systematic gene expression screen of Caenorhabditis elegans cytochrome P450 genes reveals CYP35 as strongly xenobiotic inducible, Arch Biochem Biophys 395 (2001) 158168. [66] D.W. Nebert, D.R. Nelson, R. Feyereisen, Evolution of the cytochrome P450 genes, Xenobiotica 19 (1989) 11491160. [67] F. Hannemann, A. Bichet, K.M. Ewen, R. Bernhardt, Cytochrome P450 systems biological variations of electron transport chains, Biochim Biophys Acta 1770 (2007) 330344. [68] D.R. Nelson, Cytochrome P450 and the individuality of species, Arch Biochem Biophys 369 (1999) 110. [69] R. Bernhardt, I.C. Gunsalus, Reconstitution of cytochrome P4502B4 (LM2) activity with camphor and linalool monooxygenase electron donors, Biochem Biophys Res Commun 187 (1992) 310 317310 317. [70] C.M. Jenkins, M.R. Waterman, Flavodoxin and NADPH-avodoxin reductase from Escherichia coli support bovine cytochromeP450c17 hydroxylase activities, J Biol Chem 269 (1994) 2740127408. [71] H.K. Anandatheerthavarada, S. Addya, R.S. Dwivedi, G. Biswas, J. Mullick, N.G. Avadhani, Localization of multiple forms of inducible cytochromes P450 in rat liver mitochondria: immunological characteristics and patterns of xenobiotic substrate metabolism, Arch Biochem Biophys 339 (1997) 136150. [72] M.B. Genter, C.D. Clay, T.P. Dalton, H. Dong, D.W. Nebert, H.G. Shertzer, Comparison of mouse hepatic mitochondrial versus microsomal cytochromes P450 following TCDD treatment, Biochem Biophys Res Commun 342 (2006) 13751381. [73] E.P. Neve, M. Ingelman-Sundberg, Intracellular transport and localization of microsomal cytochrome P450, Anal Bioanal Chem 392 (2008) 10751084. [74] M. Seliskar, D. Rozman, Mammalian cytochromes P450importance of tissue specicity, Biochim Biophys Acta 1770 (2007) 458466. [75] I.A. Pikuleva, C. Cao, M.R. Waterman, An additional electrostatic interaction between adrenodoxin and P450c27(CYP27A1) results in tighter binding than between adrenodoxin and P450scc (CYP11A1), J Biol Chem 274 (1999) 20452052.

[47] T. Sztal, H. Chung, L. Gramzow, P.J. Daborn, P. Batterham, C. Robin, Two independent duplications forming the Cyp307a genes in Drosophila, Insect Biochem Mol Biol 37 (2007) 10441053. [48] J.H. Thomas, Rapid birthdeath evolution specic to xenobiotic cytochrome P450 genes in vertebrates, PLoS Genet 3 (2007) e67. [49] O. Gotoh, Substrate recognition sites in cytochrome P450 family 2 (CYP2) proteins inferred from comparative analyses of amino acid and coding nucleotide sequences, J Biol Chem 267 (1992) 8390. [50] A. Zawaira, A. Matimba, C. Masimirembwa, Prediction of sites under adaptive evolution in cytochrome P450 sequences and their relationship to substrate recognition sites, Pharmacogenet Genomics 18 (2008) 467476. [51] M.R. Berenbaum, C. Favret, M.A. Schuler, On dening key innovations in an adaptive radiationcytochrome P450s and Papilionidae, Am Nat 148 (1996) S139S155. [52] Z. Wen, S. Rupasinghe, G. Niu, M.R. Berenbaum, M.A. Schuler, CYP6B1 and CYP6B3 of the Black Swallowtail (Papilio polyxenes): adaptive evolution through subfunctionalization, Mol Biol Evol 23 (2006) 24342443. [53] B.E. Shakhnovich, E.V. Koonin, Origins and impact of constraints in evolution of gene families, Genome Res 16 (2006) 15291536. [54] N. Takaya, S. Suzuki, S. Kuwazaki, H. Shoun, F. Maruo, M. Yamaguchi, K. Takeo, Cytochrome p450nor, a novel class of mitochondrial cytochrome P450 involved in nitrate respiration in the fungus Fusarium oxysporum, Arch Biochem Biophys 372 (1999) 340346. [55] D. Tomura, K. Obika, A. Fukamizu, H. Shoun, Nitric oxide reductase cytochrome P-450 gene, CYP55, of the fungus Fusarium oxysporum, J Biochem 116 (1994) 8894. [56] F.P. Guengerich, Z.L. Wu, C.J. Bartleson, Function of human cytochrome P450s: characterization of the orphans, Biochem Biophys Res Comm 338 (2005) 465469. [57] B.O. Lund, J. Lund, Novel involvement of a mitochondrial steroid hydroxylase (P450c11) in xenobiotic metabolism, J Biol Chem 270 (1995) 2089520897. [58] V.M. Guzov, G.C. Unnithan, A.A. Chernogolov, R. Feyereisen, CYP12A1, a mitochondrial cytochrome P450 from the house y, Arch Biochem Biophys 359 (1998) 231240. [59] M.R. Bogwitz, H. Chung, L. Magoc, S. Rigby, W. Wong, M. O'Keefe, J.A. McKenzie, P. Batterham, P.J. Daborn, Cyp12a4 confers lufenuron resistance in a natural population of Drosophila melanogaster, Proc Natl Acad Sci USA (2005). [60] M. Giraudo, G.C. Unnithan, G. Le Goff, R. Feyereisen, Regulation of cytochrome P450 expression in Drosophila: genomic insights, Pesticide Biochemistry and Physiology, 2010. doi:10:1016/j.pestbp.2009.06.09. [61] P.J. Daborn, C. Lumb, A. Boey, W. Wong, R.H. Ffrench-Constant, P. Batterham, Evaluating the insecticide resistance potential of eight Drosophila melanogaster cytochrome P450 genes by transgenic over-expression, Insect Biochem Mol Biol 37 (2007) 512519.

Please cite this article as: R. Feyereisen, Arthropod CYPomes illustrate the tempo and mode in P450 evolution, Biochim. Biophys. Acta (2010), doi:10.1016/j.bbapap.2010.06.012