Professional Documents
Culture Documents
This article's citation style may be unclear. The references used may be made clearer with a different or consistent style of citation, footnoting, or external linking. Pyruvate kinase, a protein from three domains (PDB 1pkn) A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural domains. One domain may appear in a variety of evolutionarily related proteins. Domains vary in length from between about 25 amino acids up to 500 amino acids in length. The shortest domains such as zinc fingers are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of calmodulin. Because they are self-stable, domains can be "swapped" by genetic engineering between one protein and another to make chimera proteins.
Contents
[show]
[edit] Background
The concept of the domain was first proposed in 1973 by Wetlaufer after X-ray crystallographic studies of hen lysozyme [1], papain [2] and by limited proteolysis studies of immunoglobulins [3][4]. Wetlaufer defined domains as stable units of protein structure that could fold autonomously. In the past domains have been described as units of:
Each definition is valid and will often overlap, i.e. a compact structural domain that is found amongst diverse proteins is likely to fold independently within its structural environment. Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities [8]. In a multidomain protein, each domain may fulfil its own function independently, or in a concerted manner with its neighbours. Domains can either serve as modules for building up large assemblies such as virus particles or muscle fibres, or can provide specific catalytic or binding sites as found in enzymes or regulatory proteins.
An appropriate example is pyruvate kinase, a glycolytic enzyme that plays an important role in regulating the flux from fructose-1,6-biphosphate to pyruvate. It contains an all- regulatory domain, an /-substrate binding domain and an /-nucleotide binding domain, connected by several polypeptide linkers [9] (see figure, right). Each domain in this protein occurs in diverse sets of protein families. The central /-barrel substrate binding domain is one of the most common enzyme folds. It is seen in many different enzyme families catalysing completely unrelated reactions[10]. The /barrel is commonly called the TIM barrel named after triose phosphate isomerase, which was the first such structure to be solved[11]. It is currently classified into 26 homologous families in the CATH domain database [12]. The TIM barrel is formed from a sequence of -- motifs closed by the first and last strand hydrogen bonding together, forming an eight stranded barrel. There is debate about the evolutionary origin of this domain. One study has suggested that a single ancestral enzyme could have diverged into several families[13], while another suggests that a stable TIM-barrel structure has evolved through convergent evolution [14]. The TIM-barrel in pyruvate kinase is 'discontinuous', meaning that more than one segment of the polypeptide is required to form the domain. This is likely to be the result of the insertion of one domain into another during the protein's evolution. It has been shown from known structures that about a quarter of structural domains are discontinuous.[15][16] The inserted -barrel regulatory domain is 'continuous', made up of a single stretch of polypeptide. Covalent association of two domains represents a functional and structural advantage since there is an increase in stability when compared with the same structures non-covalently associated [17]. Other, advantages are the protection of intermediates within inter-domain enzymatic clefts that may otherwise be unstable in aqueous environments, and a fixed stoichiometric ratio of the enzymatic activity necessary for a sequential set of reactions [18].
with each other when in the hydrophobic environment. This gives rise to regions of the polypeptide that form regular 3D structural patterns called 'secondary structure'. There are two main types of secondary structure:
-helices -sheet
All- domains have a domain core built exclusively from -helices. This class is dominated by small folds, many of which form a simple bundle with helices running up and down. All- domains have a core comprising of antiparallel -sheets, usually two sheets packed against each other. Various patterns can be identified in the arrangement of the strands, often giving rise to the identification of recurring motifs, for example the Greek key motif.[23] + domains are a mixture of all- and all- motifs. Classification of proteins into this class is difficult because of overlaps to the other three classes and therefore is not used in the CATH domain database.[12] / domains are made from a combination of -- motifs that predominantly form a parallel -sheet surrounded by amphipathic -helices. The secondary structures are arranged in layers or barrels.
[edit] Super-folds
The CATH domain database classifies domains into approximately 800 fold families, ten of these folds are highly populated and are referred to as 'super-folds'. Super-folds are defined as folds for which there are at least three structures without significant sequence similarity.[31] The most populated is the /-barrel super-fold as described previously.
The majority of genomic proteins, two-thirds in unicellular organisms and more than 80% in metazoa, are multidomain proteins created as a result of gene duplication events.[32] Many domains in multidomain structures could have once existed as independent proteins. More and more domains in eukaryotic multidomain proteins can be found as independent proteins in prokaryotes.[33] For example, vertebrates have a multi-enzyme polypeptide containing the GAR synthetase, AIR synthetase and GAR transformylase modules (GARs-AIRs-GARt; GAR: glycinamide ribonucleotide synthetase/transferase; AIR: aminoimidazole ribonucleotide synthetase). In insects, the polypeptide appears as GARs-(AIRs)2-GARt, in yeast GARs-AIRs is encoded separately from GARt, and in bacteria each domain is encoded separately.[34]
[edit] Origin
Multidomain proteins are likely to have emerged from a selective pressure during evolution to create new functions. Various proteins have diverged from common ancestors by different combinations and associations of domains. Modular units frequently move about, within and between biological systems through mechanisms of genetic shuffling:
transposition of mobile elements including horizontal transfers (between species);[35] gross rearrangements such as inversions, translocations, deletions and duplications; homologous recombination; slippage of DNA polymerase during replication.
[edit] Connectivity
Modules frequently display different connectivity relationships, as illustrated by the kinesins and ABC transporters. The kinesin motor domain can be at either end of a polypeptide chain that
includes a coiled-coil region and a cargo domain.[39] ABC transporters are built with up to four domains consisting of two unrelated modules, ATP-binding cassette and an integral membrane module, arranged in various combinations.
Advances in experimental and theoretical studies have shown that folding can be viewed in terms of energy landscapes,[43][44] where folding kinetics is considered as a progressive organisation of an ensemble of partially folded structures through which a protein passes on its way to the folded structure. This has been described in terms of a folding funnel, in which an unfolded protein has a large number of conformational states available and there are fewer states available to the folded protein. A funnel implies that for protein folding there is a decrease in energy and loss of entropy with increasing tertiary structure formation. The local roughness of the funnel reflects kinetic traps, corresponding to the accumulation of misfolded intermediates. A folding chain progresses toward lower intra-chain free-energies by increasing its compactness. The chains conformational options become increasingly narrowed ultimately toward one native structure.
Domain swapping is a mechanism for forming oligomeric assemblies.[51]. In domain swapping, a secondary or tertiary element of a monomeric protein is replaced by the same element of another protein. Domain swapping can range from secondary structure elements to whole structural domains. It also represents a model of evolution for functional adaptation by oligomerisation, e.g. oligomeric enzymes that have their active site at subunit interfaces.[52]
catalysis; regulatory activity; transport of metabolites; formation of protein assemblies and cellular locomotion.
In enzymes, the closure of one domain onto another captures a substrate by an induced fit, allowing the reaction to take place in a controlled way. Such motions can be observed when two or more crystallographic 3D structures of a protein are experimentally determined in alternate environments, or from the analysis of nuclear magnetic resonance (NMR) derived structures or from spectra [55].measured by neutron spin echo. A detailed analysis by Gerstein led to the classification of two basic types of domain motion; hinge and shear.[54] Only a relatively small portion of the chain, namely the inter-domain linker and side chains undergo significant conformational changes upon domain rearrangement.[56]
linking -helix. The helix is split into two, almost perpendicular, smaller helices separated by four residues of an extended strand.[58][59]
as tertiary structural clusters of the protein, these include both super-secondary structures and domains. The DOMAK algorithm is used to create the 3Dee domain database.[68] It calculates a 'split value' from the number of each type of contact when the protein is divided arbitrarily into two parts. This split value is large when the two parts of the structure are distinct.
[edit] DETECTIVE
Swindells (1995) developed a method, DETECTIVE, for identification of domains in protein structures based on the idea that domains have a hydrophobic interior. Deficiencies were found to occur when hydrophobic cores from different domains continue through the interface region.
Armadillo repeats. Named after the -catenin-like Armadillo protein of the fruit fly Drosophila. Basic Leucine zipper domain (bZIP domain) is found in many DNA-binding eukaryotic proteins. One part of the domain contains a region that mediates sequencespecific DNA-binding properties and the Leucine zipper that is required for the dimerization of two DNA-binding regions. The DNA-binding region comprises a number of basic aminoacids such as arginine and lysine Cadherin repeats. Cadherins function as Ca2+-dependent cell-cell adhesion proteins. Cadherin domains are extracellular regions which mediate cell-to-cell homophilic binding between cadherins on the surface of adjacent cells. Death effector domain (DED) allows protein-protein binding by homotypic interactions (DED-DED). Caspase proteases trigger apoptosis via proteolytic cascades. Pro-Caspase-8
and pro-caspase-9 bind to specific adaptor molecules via DED domains and this leads to autoactivation of caspases.
EF hand, a helix-turn-helix structural motif found in each structural domain of the signaling protein calmodulin and in the muscle protein troponin-C. Immunoglobulin-like domains are found in proteins of the immunoglobulin superfamily (IgSF). [74] They contain about 70-110 amino acids and are classified into different categories (IgV, IgC1, IgC2 and IgI) according to their size and function. They possess a characteristic fold in which two beta sheets form a sandwich that is stabilized by interactions between conserved cysteines and other charged amino acids. They are important for protein-to-protein interactions in processes of cell adhesion, cell activation, and molecular recognition. These domains are commonly found in molecules with roles in the immune system. Phosphotyrosine-binding domain (PTB). PTB domains usually bind to phosphorylated tyrosine residues. They are often found in signal transduction proteins. PTB-domain binding specificity is determined by residues to the amino-terminal side of the phosphotyrosine. Examples: the PTB domains of both SHC and IRS-1 bind to a NPXpY sequence. PTB-containing proteins such as SHC and IRS-1 are important for insulin responses of human cells. Pleckstrin homology domain (PH). PH domains bind phosphoinositides with high affinity. Specificity for PtdIns(3)P, PtdIns(4)P, PtdIns(3,4)P2, PtdIns(4,5)P2, and PtdIns(3,4,5)P3 have all been observed. Given the fact that phosphoinositides are sequestered to various cell membranes (due to their long lipophilic tail) the PH domains usually causes recruitment of the protein in question to a membrane where the protein can exert a certain function in cell signalling, cytoskeletal reorganization or membrane trafficking. Src homology 2 domain (SH2). SH2 domains are often found in signal transduction proteins. SH2 domains confer binding to phosphorylated tyrosine (pTyr). Named after the phosphotyrosine binding domain of the src viral oncogene, which is itself a tyrosine kinase. See also: SH3 domain. Zinc finger DNA binding domain (ZnF_GATA). ZnF_GATA domain-containing proteins are typically transcription factors that usually bind to the DNA sequence [AT]GATA[AG] of promoters.
The preceding text and figures originate from "Predicting Structural Domains in Proteins" George RA, 2002
CATH Conserved domains Motif domain Protein Protein structure Protein structure prediction Protein family Structural biology Structural Classification of Proteins (SCOP)
The Protein Families (Pfam) database clan browser provides easy access to information about protein structural domains. A clan contains two or more Pfam families that have arisen from a single evolutionary origin.
3Dee CATH DALI SCOP Pawson Lab - Protein interaction domains Nash Lab - Protein interaction domains in Signal Transduction Definition and assignment of structural domains in proteins.
InterPro Pfam PROSITE ProDom SMART NCBI Conserved Domain Database SUPERFAMILY Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms