A
protein domain is a part of protein sequence and
structureIn biochemistry and chemistry, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...
that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and
foldedProtein folding is the physical process by which a polypeptide folds into its characteristic and functional three-dimensional structure from random coil....
. Many proteins consist of several structural domains. One domain may appear in a variety of evolutionarily related proteins. Domains vary in length from between about 25 amino acids up to 500 amino acids in length. The shortest domains such as zinc fingers are stabilized by metal ions or disulfide bridges. Domains often form functional units, such as the calcium-binding EF hand domain of
calmodulinthumb|200px|Flexibility of Calmodulin. Calmodulin can bind to calmodulin-dependent-protein kinase II-alpha ; it can bind to myosine light chain ; it can bind to edema factor toxin from the anthrax bacteria...
. Because they are self-stable, domains can be "swapped" by
genetic engineeringGenetic engineering, recombinant DNA technology, genetic modification/manipulation and gene splicing are terms that apply to the direct manipulation of an organism's genes. Genetic engineering is different from traditional breeding, where the organism's genes are manipulated indirectly...
between one protein and another to make chimera proteins.
Background
The concept of the
domain was first proposed in 1973 by Wetlaufer after X-ray
crystallographic studies of hen
lysozymeLysozyme, also known as muramidase or N-acetylmuramide glycanhydrolase, are a family of enzymes which damage bacterial cell walls by catalyzing hydrolysis of 1,4-beta-linkages between N-acetylmuramic acid and N-acetyl-D-glucosamine residues in a peptidoglycan and between N-acetyl-D-glucosamine...
and
papainPapain is a cysteine protease enzyme present in papaya and mountain papaya .-Structure:...
and by limited proteolysis studies of immunoglobulins . Wetlaufer defined domains as stable units of
protein structureProteins are an important class of biological macromolecules present in all biological organisms, made up of such elements as carbon, hydrogen, nitrogen, oxygen, and sulphur. All proteins are polymers of amino acids. According to their physical size, proteins are nanoparticles...
that could fold autonomously. In the past domains have been described as units of:
- compact structure
- function and evolution
- folding .
Each definition is valid and will often overlap, i.e. a compact structural domain
that is found amongst diverse proteins is likely to fold independently within its
structural environment. Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities . In a multidomain protein, each domain may fulfil its own function independently, or in a concerted manner with its neighbours. Domains can either serve as modules for building up large assemblies such as virus particles or muscle fibres, or can provide specific catalytic or binding sites as found in enzymes or regulatory proteins.
An appropriate example is
pyruvate kinasePyruvate kinase is an enzyme involved in glycolysis. It catalyzes the transfer of a phosphate group from phosphoenolpyruvate to ADP, yielding one molecule of pyruvate and one molecule of ATP.-Reaction:The reaction with pyruvate kinase:...
, a glycolytic enzyme that plays an
important role in regulating the flux from fructose-1,6-biphosphate to pyruvate.
It contains an all-β regulatory domain, an α/β-substrate binding domain and an
α/β-nucleotide binding domain, connected by several polypeptide linkers (see figure, right).
Each domain in this protein occurs in diverse sets of protein families.
The central α/β-barrel substrate binding domain is one of the most common
enzymeEnzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process are called substrates, and the enzyme converts them into different molecules, called the products. Almost all processes in a biological cell need enzymes to occur at...
folds. It is seen in many different enzyme families catalysing completely
unrelated reactions. The α/β-barrel is commonly called
the
TIM barrelThe TIM barrel is a conserved protein fold consisting of eight α-helices and eight parallel β-strands that alternate along the peptide backbone. The structure is named after triosephosphate isomerase, a conserved glycolytic enzyme...
named after triose phosphate isomerase, which was the first such
structure to be solved. It is currently classified into 26
homologous families in the CATH domain database . The
TIM barrel is formed from a sequence of β-α-β motifs closed by the first and
last strand hydrogen bonding together, forming an eight stranded barrel. There
is debate about the evolutionary origin of this domain. One study has suggested
that a single ancestral enzyme could have diverged into several families, while another suggests that a stable TIM-barrel structure has evolved
through convergent evolution .
The TIM-barrel in pyruvate kinase is 'discontinuous', meaning that more than
one segment of the polypeptide is required to form the domain. This is likely to be
the result of the insertion of one domain into another during the protein's evolution.
It has been shown from known structures that about a quarter of
structural domains are discontinuous.
The inserted β-barrel regulatory domain is 'continuous', made up of a single stretch
of polypeptide.
Covalent association of two domains represents a functional and structural
advantage since there is an increase in stability when compared with the same
structures non-covalently associated . Other, advantages
are the protection of intermediates within inter-domain enzymatic clefts that may
otherwise be unstable in aqueous environments, and a fixed stoichiometric ratio of
the enzymatic activity necessary for a sequential set of reactions .
Primary structure
The
primary structureIn biochemistry, the primary structure of a biological molecule is the exact specification of its atomic composition and the chemical bonds connecting those atoms...
(string of amino acids) of a
proteinProteins are organic compounds made of amino acids arranged in a linear chain and folded into a globular form. The amino acids in a polymer chain are joined together by the peptide bonds between the carboxyl and amino groups of adjacent amino acid residues...
encodes its uniquely folded 3D conformation. The most important factor governing the folding of a protein into 3D structure is the distribution of polar and non-polar side chains. Folding is driven by the burial of hydrophobic side chains into the interior of the molecule so to avoid contact with the aqueous environment.
Sequence alignmentIn bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...
is an important tool for determining domains.
Secondary structure
Generally proteins have a core of hydrophobic
residuesIn chemistry, residue refers to the material remaining after a distillation or an evaporation, or to a portion of a larger molecule, such as a methyl group....
surrounded by a shell of hydrophilic residues. Since the peptide bonds themselves are polar they are neutralised by hydrogen bonding with each other when in the hydrophobic environment. This gives rise to regions of the polypeptide that form regular 3D structural patterns called '
secondary structureIn biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...
'. There are two main types of secondary structure:
- α-helices
A common motif in the secondary structure of proteins, the alpha helix is a right- or left-handed coiled conformation, resembling a spring, in which every backbone N-H group donates a hydrogen bond to the backbone C=O group of the amino acid four residues earlier...
- β-sheet
The β sheet is the second form of regular secondary structure in proteins consisting of beta strands connected laterally by five or more hydrogen bonds, forming a generally twisted, pleated sheet...
Secondary structure motifs
Some simple combinations of
secondary structureIn biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...
elements have been found to frequently occur in
protein structureProteins are an important class of biological macromolecules present in all biological organisms, made up of such elements as carbon, hydrogen, nitrogen, oxygen, and sulphur. All proteins are polymers of amino acids. According to their physical size, proteins are nanoparticles...
and are referred to as 'super-secondary structure' or
motifsIn an unbranched, chain-like biological molecule, such as a protein or a strand of RNA, a structural motif is a three-dimensional structural element or fold within the chain, which appears also in a variety of other molecules...
. For example, the β-hairpin motif consists of two adjacent antiparallel β-strands joined by a small loop. It is present in most antiparallel β structures both as an isolated ribbon and as part of more complex β-sheets. Another common super-secondary structure is the β-α-β motif, which is frequently used to connect two parallel β-strands. The central α-helix connects the C-termini of the first strand to the N-termini of the second strand, packing its side chains against the β-sheet and therefore shielding the hydrophobic residues of the β-strands from the surface.
Tertiary structure
Several motifs pack together to form compact, local, semi-independent units called domains.
The overall 3D structure of the polypeptide chain is referred to as the protein's '
tertiary structureIn biochemistry and chemistry, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...
'. Domains are the fundamental units of tertiary structure, each domain containing an individual hydrophobic core built from secondary structural units connected by loop regions. The packing of the polypeptide is usually much tighter in the interior than the exterior of the domain producing a solid-like core and a fluid-like surface. In fact, core
residues are often conserved in a protein family, whereas the residues in loops are
less conserved, unless they are involved in the protein's function. Protein tertiary
structure can be divided into four main classes based on the secondary structural
content of the domain.
- All-α domains have a domain core built exclusively from α-helices. This class is dominated by small folds, many of which form a simple bundle with helices running up and down.
- All-β domains have a core comprising of antiparallel β-sheets, usually two sheets packed against each other. Various patterns can be identified in the arrangement of the strands, often giving rise to the identification of recurring motifs, for example the Greek key motif.
- α+β domains are a mixture of all-α and all-β motifs. Classification of proteins into this class is difficult because of overlaps to the other three classes and therefore is not used in the CATH
The CATH Protein Structure Classification is a semi-automatic, hierarchical classification of protein domains published in 1997 by Christine Orengo, Janet Thornton and their colleagues....
domain database.
- α/β domains are made from a combination of β-α-β motifs that predominantly form a parallel β-sheet surrounded by amphipathic α-helices. The secondary structures are arranged in layers or barrels.
Structural alignmentStructural alignment attempts to establish equivalences between two or more polymer structures based on their shape and three-dimensional conformation. This process is usually applied to protein tertiary structures but can also be used for large RNA molecules...
is an important tool for determining domains.
Domains have limits on size
Domains have limits on size. The size of individual structural domains varies from 36 residues in E-selectin to 692 residues in lipoxygenase-1, but the majority, 90%, have less than 200 residues with an average of approximately 100 residues. Very short domains, less than 40 residues, are often stabilised by metal ions or disulfide bonds. Larger domains, greater than 300 residues, are likely to consist of multiple hydrophobic cores.
Modules
Nature is a tinkerer and not an inventor, new sequences are adapted
from pre-existing sequences rather than invented. Domains are the common material
used by nature to generate new sequences, they can be thought of as genetically
mobile units, referred to as 'modules'. Often, the C and N termini of domains are close together in space, allowing them to easily be "slotted into" parent structures during the process of evolution. Many domain families are found in all three forms of life,
ArchaeaThe Archaea are a group of single-celled microorganisms. A single individual or species from this domain is called an archaeon . They have no cell nucleus or any other organelles within their cells...
,
BacteriaThe bacteria are a large group of unicellular microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...
and Eukarya. Domains that are repeatedly found in diverse proteins are
often referred to as modules, examples can be found among extracellular proteins
associated with clotting, fibrinolysis, complement, the extracellular matrix, cell surface
adhesion molecules and cytokine receptors.
Protein families
Molecular evolutionMolecular evolution is the process of evolution at the scale of DNA, RNA, and proteins. Molecular evolution emerged as a scientific field in the 1960s as researchers from molecular biology, evolutionary biology and population genetics sought to understand recent discoveries on the structure and...
gives rise to families of related proteins with similar sequence and structure. However, sequence similarities can be extremely low between proteins that share the same structure. Protein structures may be similar because proteins have diverged from a common ancestor. Alternatively, some folds may be more favored than others as they represent stable arrangements of secondary structures and some proteins may converge towards these folds over the course of evolution . There are currently about 45,000 experimentally determined protein 3D structures deposited within the
Protein Data BankThe Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids. . The data, typically obtained by X-ray crystallography or NMR spectroscopy and submitted by biologists and biochemists from around the world, can be accessed at...
(PDB). However this set contains a lot of identical or very similar structures. All proteins should be classified to structural families to understand their evolutionary relationships. Structural comparisons are best achieved at the domain level. For this reason many algorithms have been developed to automatically assign domains in proteins with known 3D structure, see 'Domain definition from structural co-ordinates'.
Super-folds
The CATH domain database classifies domains into approximately 800 fold families, ten of these folds are highly populated and are referred to as 'super-folds'. Super-folds are defined as folds for which there are at least three structures without significant sequence similarity. The most populated is the α/β-barrel super-fold as described previously.
Multidomain proteins
The majority of genomic proteins, two-thirds in unicellular organisms and
more than 80% in metazoa, are multidomain proteins created as a result of gene
duplication events. Many domains in multidomain structures
could have once existed as independent proteins. More and more domains in
eukaryotic multidomain proteins can be found as independent proteins in prokaryotes. For example, vertebrates have a multi-enzyme polypeptide
containing the
GAR synthetaseIn enzymology, a phosphoribosylamine-glycine ligase is an enzyme that catalyzes the chemical reactionThe 3 substrates of this enzyme are ATP, 5-phospho-D-ribosylamine, and glycine, whereas its 3 products are ADP, phosphate, and N1-glycinamide.This enzyme belongs to the family of ligases,...
, AIR synthetase and GAR transformylase modules
(GARs-AIRs-GARt; GAR: glycinamide ribonucleotide synthetase/transferase; AIR: aminoimidazole ribonucleotide synthetase). In insects, the polypeptide appears as GARs-(AIRs)2-GARt,
in yeast GARs-AIRs is encoded separately from GARt, and in bacteria each domain
is encoded separately.
Origin
Multidomain proteins are likely to have emerged from a selective pressure during
evolutionIn biology, evolution is change in the genetic material of a population of organisms from one generation to the next. Though changes produced in any one generation are normally small, differences accumulate with each generation and can, over time, cause substantial changes in the population, a...
to create new functions. Various proteins have diverged from common
ancestors by different combinations and associations of domains. Modular units
frequently move about, within and between biological systems through mechanisms
of genetic shuffling:
- transposition of mobile elements including horizontal transfers (between species);
- gross rearrangements such as inversions, translocations, deletions and duplications;
- homologous recombination
Homologous recombination, also known as general recombination, is a type of genetic recombination in which nucleotide sequences are exchanged between two similar or identical strands of DNA. The process involves several steps of physical breaking and the eventual rejoining of DNA...
;
- slippage of DNA polymerase
A DNA polymerase is an enzyme that catalyzes the polymerization of deoxyribonucleotides into a DNA strand. DNA polymerases are best-known for their role in DNA replication, in which the polymerase "reads" an intact DNA strand as a template and uses it to synthesize the new strand...
during replication.
Difference in proliferation
It is likely that all these and organisms. For example,
the ABC transporter domain constitutes one of the largest domain families that
appear in all organisms. Many other families that appear in
all organisms show much less proliferation. These include metabolic enzymes and
components of translational apparatus.
Types of organisation
The simplest multidomain organisation seen in proteins is that of a single domain
repeated in tandem. The domains may interact with each other or remain isolated, like beads on string. The giant 30,000 residue muscle protein
titinTitin, also known as connectin, is the largest known protein. It is important in the contraction of striated muscle tissues.- Structure :...
comprises about
120 fibronectin-III-type and Ig-type domains. In the serine proteases, a gene duplication event has led to the formation of a two β-barrel domain
enzyme. The repeats have diverged so widely that there is no
obvious sequence similarity between them. The active site is located at a cleft
between the two β-barrel domains, in which functionally important residues are
contributed from each domain. Genetically engineered mutants of the
chymotrypsinChymotrypsin is a digestive enzyme that can perform proteolysis. Chymotrypsin cleaves peptides at the carboxyl side of tyrosine, tryptophan, and phenylalanine because these three amino acids contain aromatic rings, which fit into a 'hydrophobic pocket' in the enzyme...
serine proteaseSerine proteases or serine endopeptidases are proteases in which one of the amino acids at the active site is serine....
were shown to have some proteinase activity even though their active
site residues were abolished and it has therefore been postulated that the duplication
event enhanced the enzyme's activity.
Connectivity
Modules frequently display different connectivity relationships, as illustrated by
the
kinesinKinesins are a class of motor proteins found in eukaryotic cells. Kinesins move along microtubule cables powered by the dephosphorylation of ATP...
s and ABC transporters. The kinesin motor domain can be at either end
of a polypeptide chain that includes a coiled-coil region and a cargo domain. ABC transporters are built with up to four domains consisting
of two unrelated modules, ATP-binding cassette and an integral membrane module,
arranged in various combinations.
Domain insertion
Not only do domains recombine, but there are many examples of a domain
having been inserted into another. Sequence or structural similarities to other
domains demonstrate that homologues of inserted and parent domains can exist
independently. An example is that of the 'fingers' inserted into the 'palm' domain
within the polymerases of the Pol I family.
Difference between structural and evolutionary domain
Since a domain can be inserted into another, there should always be at least one continuous domain in a multidomain protein. This is the main difference between definitions of structural domains and evolutionary/functional domains. An evolutionary domain will be limited to one or two connections between domains, whereas structural domains can have unlimited
connections, within a given criterion of the existence of a common core. Several
structural domains could be assigned to an evolutionary domain.
History
‘’Protein folding - the unsolved problem’’
Since the seminal work of Anfinsen over forty years ago, the goal to completely understand the mechanism by which a polypeptide rapidly
folds into its stable native conformation remains elusive. Many experimental folding
studies have contributed much to our understanding, but the principles that govern
protein folding are still based on those discovered in the very first studies of folding.
Anfinsen showed that the native state of a protein is thermodynamically stable, the
conformation being at a global minimum of its free energy.
Folding pathway
Folding is a directed search of conformational space allowing the protein to fold
on a biologically feasible time scale. The
Levinthal paradoxLevinthal's paradox or the Levinthal paradox is a thought experiment in the theory of protein folding dynamics. In 1969, Cyrus Levinthal noted that, because of the very large number of degrees of freedom in an unfolded polypeptide chain, the molecule has an astronomical number of possible...
states that if an averaged sized protein would sample all possible conformations before finding the one with
the lowest energy, the whole process would take billions of years.
Proteins typically fold within 0.1 and 1000 seconds, therefore the protein folding
process must be directed some way through a specific folding pathway. The forces
that direct this search are likely to be a combination of local and global influences whose effects are felt at various stages of the reaction.
Advances in experimental and theoretical studies have shown that folding can be
viewed in terms of energy landscapes,
where folding kinetics is considered as a progressive organisation of an ensemble
of partially folded structures through which a protein passes on its way to the
folded structure. This has been described in terms of a
folding funnelThe folding funnel hypothesis is a specific version of the energy landscape theory of protein folding, which assumes that a protein's native state corresponds to its free energy minimum under the solution conditions usually encountered in cells...
, in which
an unfolded protein has a large number of conformational states available and
there are fewer states available to the folded protein. A funnel implies that for
protein folding there is a decrease in energy and loss of entropy with increasing
tertiary structure formation. The local roughness of the funnel reflects kinetic traps,
corresponding to the accumulation of misfolded intermediates. A folding chain
progresses toward lower intra-chain free-energies by increasing its compactness. The
chains conformational options become increasingly narrowed ultimately toward one
native structure.
Advantage of domains in protein folding
The organisation of large proteins by structural domains represents an advantage
for protein folding, with each domain being able to individually fold, accelerating the
folding process and reducing a potentially large combination of residue interactions. Furthermore, given the observed random distribution of hydrophobic residues in proteins, domain formation appears to be the optimal solution for a large protein to bury its hydrophobic residues while keeping the hydrophilic residues at the surface.
However, the role of inter-domain interactions in protein folding and in energetics
of stabilisation of the native structure, probably differs for each protein. In T4
lysozyme, the influence of one domain on the other is so strong that the entire
molecule is resistant to proteolytic cleavage. In this case, folding is a sequential
process where the C-terminal domain is required to fold independently in an early
step, and the other domain requires the presence of the folded C-terminal domain
for folding and stabilisation.
It has been found that the folding of an isolated domain can take place at
the same rate or sometimes faster than that of the integrated domain. Suggesting that unfavourable interactions with the rest of the protein can occur during folding. Several arguments suggest that the slowest step in the folding of large proteins is the pairing of the folded domains. This is either because the domains are not folded entirely correctly or because the small adjustments required for their interaction are energetically unfavourable, such as the removal of water from the domain interface.
About quaternary structures
Many proteins have a quaternary structure, which consists of several polypeptide
chains that associate into an oligomeric molecule. Each polypeptide chain in such a
protein is called a subunit. Hemoglobin, for example, consists of two α and two β
subunits. Each of the four chains has an all-α globin fold with a heme pocket.
Domain swapping
Domain swapping is a mechanism for forming oligomeric assemblies.. In domain swapping, a secondary or tertiary element of a monomeric
protein is replaced by the same element of another protein. Domain swapping
can range from secondary structure elements to whole structural domains. It also
represents a model of evolution for functional adaptation by oligomerisation, e.g.
oligomeric enzymes that have their active site at subunit interfaces.
Domains and protein flexibility
The presence of multiple domains in proteins gives rise to a great deal of
flexibility and mobility, leading to
protein domain dynamics. Domain motions
can be inferred by comparing structures of a protein
in different environments,
or directly observed using spectra measured by
neutron spin echoNeutron spin echo spectroscopy is an inelastic neutron scattering technique invented by Ferenc Mezei in the 1970's, and developed in collaboration with John Hayter...
spectroscopy. One of the largest observed domain motions is the `swivelling'
mechanism in pyruvate phosphate dikinase. The phosphoinositide domain swivels
between two states in order to bring a phosphate group from the active site of the
nucleotide binding domain to that of the phosphoenolpyruvate/pyruvate domain. The phosphate group is moved over a distance of 45A
involving a domain motion of about 100 degrees around a single residue. Domain motions
are important for:
- catalysis;
- regulatory activity;
- transport of metabolites;
- formation of protein assemblies and
- cellular locomotion.
In enzymes, the closure of one domain onto another captures a substrate by an
induced fit, allowing the reaction to take place in a controlled way. A detailed analysis by Gerstein led to the classification of two basic types of domain motion; hinge and shear.
Only a relatively small portion of the chain, namely the inter-domain linker and
side chains undergo significant conformational changes upon domain rearrangement.
Hinges by secondary structures
A study by Hayward found that the termini of α-helices and
β-sheets form hinges in a large number of cases. Many hinges were found to involve
two secondary structure elements acting like hinges of a door, allowing an opening
and closing motion to occur. This can arise when two neighbouring strands within
a β-sheet situated in one domain, diverge apart as they join the other domain. The
two resulting termini then form the bending regions between the two domains. α-
helices that preserve their hydrogen bonding network when bent are found to behave
as mechanical hinges, storing `elastic energy' that drives the closure of domains for
rapid capture of a substrate.
Helical to extended conformation
The interconversion of helical and extended conformations at the site of a domain
boundary is not uncommon. In calmodulin, torsion angles change for five residues
in the middle of a domain linking α-helix. The helix is split into two, almost
perpendicular, smaller helices separated by four residues of an extended strand.
Shear motions
Shear motions involve a small sliding movement of domain interfaces, controlled
by the amino acid side chains within the interface. Proteins displaying shear motions
often have a layered architecture: stacking of secondary structures. The interdomain
linker has merely the role of keeping the domains in close proximity.
Domain definition from structural co-ordinates
The importance of domains as structural building blocks and elements of evolution
has brought about many automated methods for their identification and classification
in proteins of known structure. Automatic procedures for reliable domain
assignment is essential for the generation of the domain databases, especially as the
number of protein structures is increasing. Although the boundaries of a domain
can be determined by visual inspection, construction of an automated method is not
straightforward. Problems occur when faced with domains that are discontinuous
or highly associated. The fact that there is no
standard definition of what a domain really is has meant that domain assignments
have varied enormously, with each researcher using a unique set of criteria.
A structural domain is a compact, globular sub-structure with more interactions
within it than with the rest of the protein.
Therefore, a structural domain can be determined by two visual characteristics;
its compactness and its extent of isolation. Measures of local compactness in proteins have been used in many of the early methods
of domain assignment and in several of the more recent methods.
Considering proteins as small segments
One of the first algorithms used a Cα-Cα distance map together with a hierarchical clustering routine that considered proteins as several small segments, 10 residues in length. The initial segments were clustered one after another based
on inter-segment distances; segments with the shortest distances were clustered and
considered as single segments thereafter. The stepwise clustering finally included
the full protein. Go also exploited the fact that inter-domain distances are
normally larger than intra-domain distances; all possible Cα-Cα distances were
represented as diagonal plots in which there were distinct patterns for helices,
extended strands and combinations of secondary structures.
Sowdhamini and Blundell’s method
The method by Sowdhamini and Blundell clusters secondary structures in
a protein based on their Cα-Cα distances and identifies domains from the pattern in
their
dendrogramA dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering...
s. As the procedure does not consider the protein as a continuous
chain of amino acids there are no problems in treating discontinuous domains.
Specific nodes in these dendrograms are identified as tertiary structural clusters
of the protein, these include both super-secondary structures and domains.
The DOMAK algorithm is used to create the 3Dee domain database. It calculates a 'split value' from the number of each type of
contact when the protein is divided arbitrarily into two parts. This split value is
large when the two parts of the structure are distinct.
Method of Wodak and Janin
The method of Wodak and Janin was based on the calculated interface
areas between two chain segments repeatedly cleaved at various residue positions.
Interface areas were calculated by comparing surface areas of the cleaved segments
with that of the native structure. Potential domain boundaries can be identified at a
site where the interface area was at a minimum.
Other methods have used measures of solvent accessibility to calculate compactness.
PUU algorithm
The PUU algorithm incorporates a harmonic model
used to approximate inter-domain dynamics. The underlying physical concept is
that many rigid interactions will occur within each domain and loose interactions
will occur between domains. This algorithm is used to define domains in the
FSSPFamilies of Structurally Similar Proteins or FSSP is a database of structurally superimposed proteins generated using the DALI algorithm. The database is helpful for the comparison of protein structures.-External links:*...
domain database.
DETECTIVE
Swindells (1995) developed a method, DETECTIVE, for identification of
domains in protein structures based on the idea that domains have a hydrophobic
interior. Deficiencies were found to occur when hydrophobic cores from different
domains continue through the interface region.
RigidFinder
RigidFinder is a novel method for identification of protein rigid blocks (domains and loops) from two different conformations. Rigid blocks are defined as blocks where all inter residue distances are conserved across conformations.
Example domains
- Armadillo repeats
Armadillo repeats are named after the β-catenin-like Armadillo protein of the fruit fly Drosophila.These domains are about 40 amino acids long and proteins that contain them often have many tandemly repeated domains....
. Named after the β-catenin-like Armadillo protein of the fruit fly DrosophilaDrosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is commonly known as the common fruit fly or vinegar fly. Starting from Charles W...
.
- Basic Leucine zipper domain (bZIP domain
The Basic Leucine Zipper Domain is found in many DNA binding eukaryotic proteins. One part of the domain contains a region that mediates sequence specific DNA binding properties and the leucine zipper that is required for the dimerization of two DNA binding regions. The DNA binding region...
) is found in many DNA-binding eukaryoticA eukaryote is an organism whose cells contain complex structures enclosed within membranes. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear envelope, within which the genetic material is carried...
proteins. One part of the domain contains a region that mediates sequence-specific DNA-binding properties and the Leucine zipper that is required for the dimerA dimer is a chemical or biological entity consisting of two structurally similar subunits called monomers, which are joined by bonds, which can be strong or weak.- Organic chemistry :...
ization of two DNA-binding regions. The DNA-binding region comprises a number of basic aminoacids such as arginineArginine is an α-amino acid. The
L-form is one of the 20 most common natural amino acids. Its codons are CGU, CGC, CGA, CGG, AGA, and AGG. In mammals, arginine is classified as a semiessential or conditionally essential amino acid, depending on the developmental stage and health...
and lysineLysine is an α-amino acid with the chemical formula HO
2CCH
4NH
2. This amino acid is an essential amino acid, which means that humans cannot synthesize it. Its codons are AAA and AAG.Lysine is a base, as are arginine and histidine...
- Cadherin repeats. Cadherins function as Ca2+-dependent cell-cell adhesion
Cellular adhesion is the binding of a cell to a surface, extracellular matrix or another cell using cell adhesion molecules such as selectins, integrins, and cadherins.- Process :...
proteins. Cadherin domains are extracellular regions which mediate cell-to-cell homophilic binding between cadherins on the surface of adjacent cells.
- Death effector domain
The death-effector domain is a protein interaction domain found in inactive procaspases and proteins that regulate caspase activation in the apoptosis cascade such as FAS-associating death domain-containing protein . FADD recruits procaspase 8 and procaspase 10 into a death induced signaling...
(DED) allows protein-protein binding by homotypic interactions (DED-DED). CaspaseCaspases, or cysteine-aspartic proteases, are a family of cysteine proteases, which play essential roles in apoptosis , necrosis and inflammation....
proteaseA protease breaks down proteins. A protease is any enzyme that conducts proteolysis, that is, begins protein catabolism by hydrolysis of the peptide bonds that link amino acids together in the polypeptide chain forming the protein...
s trigger apoptosisApoptosis is the process of programmed cell death that may occur in multicellular organisms. Programmed cell death involves a series of biochemical events leading to a characteristic cell morphology and death; in more specific terms, a series of biochemical events that lead to a variety of...
via proteolytic cascades. Pro-Caspase-8 and pro-caspase-9 bind to specific adaptor molecules via DED domains and this leads to autoactivation of caspases.
- EF hand
The EF hand is a helix-loop-helix structural domain found in a large family of calcium-binding proteins. The EF-hand motif contains a helix-loop-helix topology, much like the the spread thumb and forefinger of the human hand, in which the Ca2+ ions are coordinated by ligands within the...
, a helix-turn-helixIn proteins, the helix-turn-helix is a major structural motif capable of binding DNA. It is composed of two α helices joined by a short strand of amino acids and is found in many proteins that regulate gene expression...
structural motifIn an unbranched, chain-like biological molecule, such as a protein or a strand of RNA, a structural motif is a three-dimensional structural element or fold within the chain, which appears also in a variety of other molecules...
found in each structural domain of the signaling protein calmodulinthumb|200px|Flexibility of Calmodulin. Calmodulin can bind to calmodulin-dependent-protein kinase II-alpha ; it can bind to myosine light chain ; it can bind to edema factor toxin from the anthrax bacteria...
and in the muscle protein troponin-C.
- Immunoglobulin-like domains are found in proteins of the immunoglobulin superfamily
The immunoglobulin superfamily is a large group of cell surface and soluble proteins that are involved in the recognition, binding, or adhesion processes of cells. Molecules are categorized as members of this superfamily based on shared structural features with immunoglobulins ; they all possess a...
(IgSF). They contain about 70-110 amino acidAmino acids are molecules containing an amine group, a carboxylic acid group and one of the twenty R-groups. These molecules are particularly important in biochemistry, where this term refers to alpha-amino acids with the general formula H
2NCHRCOOH, where R is an organic substituent...
s and are classified into different categories (IgV, IgC1, IgC2 and IgI) according to their size and function. They possess a characteristic fold in which two beta sheetThe β sheet is the second form of regular secondary structure in proteins consisting of beta strands connected laterally by five or more hydrogen bonds, forming a generally twisted, pleated sheet...
s form a “sandwich” that is stabilized by interactions between conserved cysteineCysteine is an α-amino acid with the chemical formula HO
2CCHCH
2SH. It is a non-essential amino acid, which means that it is biosynthesized in humans. Its codons are UGU and UGC. The side chain on cysteine is thiol, which is nonpolar and thus cysteine is usually classified as...
s and other charged amino acidAmino acids are molecules containing an amine group, a carboxylic acid group and one of the twenty R-groups. These molecules are particularly important in biochemistry, where this term refers to alpha-amino acids with the general formula H
2NCHRCOOH, where R is an organic substituent...
s. They are important for protein-to-protein interactions in processes of cell adhesionCellular adhesion is the binding of a cell to a surface, extracellular matrix or another cell using cell adhesion molecules such as selectins, integrins, and cadherins.- Process :...
, cell activation, and molecular recognition. These domains are commonly found in molecules with roles in the immune systemAn immune system is a system of biological structures and processes within an organism that protects against disease by identifying and killing pathogens and tumour cells. It detects a wide variety of agents, from viruses to parasitic worms, and needs to distinguish them from the organism's own...
.
- Phosphotyrosine-binding domain
Phosphotyrosine-binding domain in the protein tensintends to be found at the C-terminus. Tensin is amulti-domain protein that binds to actin filaments and functions...
(PTB). PTB domains usually bind to phosphorylated tyrosine residues. They are often found in signal transduction proteins. PTB-domain binding specificity is determined by residues to the amino-terminal side of the phosphotyrosine. Examples: the PTB domains of both SHC and IRS-1 bind to a NPXpY sequence. PTB-containing proteins such as SHC and IRS-1 are important for insulinInsulin is a hormone that has extensive effects on metabolism and other body functions, such as vascular compliance. Insulin causes cells in the liver, muscle, and fat tissue to take up glucose from the blood, storing it as glycogen in the liver and muscle, and stopping use of fat as an energy...
responses of human cells.
- Pleckstrin homology domain
Pleckstrin homology domain is a protein domain of approximately 120 amino acids that occurs in a wide range of proteins involved in intracellular signaling or as constituents of the cytoskeleton....
(PH). PH domains bind phosphoinositides with high affinity. Specificity for PtdIns(3)P, PtdIns(4)P, PtdIns(3,4)P2, PtdIns(4,5)P2, and PtdIns(3,4,5)P3 have all been observed. Given the fact that phosphoinositides are sequestered to various cell membranes (due to their long lipophilic tail) the PH domains usually causes recruitment of the protein in question to a membrane where the protein can exert a certain function in cell signalling, cytoskeletal reorganization or membrane trafficking.
- Src homology 2 domain (SH2). SH2 domains are often found in signal transduction proteins. SH2 domains confer binding to phosphorylated tyrosine (pTyr). Named after the phosphotyrosine binding domain of the src viral oncogene
An oncogene is a gene that, when mutated or expressed at high levels, helps turn a normal cell into a tumor cell.Many cells normally undergo a programmed form of death . Activated oncogenes can cause those cells to survive and proliferate instead...
, which is itself a tyrosine kinaseA tyrosine kinase is an enzyme that can transfer a phosphate group from ATP to a tyrosine residue in a protein. Tyrosine kinases are a subgroup of the larger class of protein kinases...
. See also: SH3 domainThe SRC HOMOLOGY 3 DOMAIN is a small protein domain of about 60 amino acids residues first identified as a conserved sequence in the viral adaptor protein v-Crk and the non-catalytic parts of enzymes such as phospholipase and several cytoplasmic tyrosine kinases such as Abl and Src...
.
- Zinc finger DNA binding domain
Zinc fingers are small protein domains that can coordinate one or more zinc ions to help stabilize their folds. They can be classified into several different structural families and typically function as interaction modules that bind DNA, RNA, proteins or small molecules...
(ZnF_GATA). ZnF_GATA domain-containing proteins are typically transcription factorIn the field of molecular biology, a transcription factor is a protein that binds to specific DNA sequences and thereby controls the transfer of genetic information from DNA to mRNA...
s that usually bind to the DNA sequence [AT]GATA[AG] of promoterIn genetics, a promoter is a region of DNA that facilitates the transcription of a particular gene. Promoters are typically located near the genes they regulate, on the same strand and upstream .-Overview:...
s.
The preceding text and figures originate from "Predicting Structural Domains in Proteins" George RA, 2002
See also
- Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and one of the twenty R-groups. These molecules are particularly important in biochemistry, where this term refers to alpha-amino acids with the general formula H
2NCHRCOOH, where R is an organic substituent...
- Binding domain
A binding domain is a protein domain which binds to a specific atom or molecule, such as calcium or DNA. Upon binding, proteins may undergo a conformational change. Binding domains are essential for the function of many proteins....
- CATH
The CATH Protein Structure Classification is a semi-automatic, hierarchical classification of protein domains published in 1997 by Christine Orengo, Janet Thornton and their colleagues....
- Conserved domains
Conserved domains are recurring units in polypeptide chains , determined and classified by comparative analysis. Molecular evolution uses such domains as building blocks and these may be recombined in different arrangements to make different proteins with different functions...
- Motif domain
Protein binding motif is a short, usually linear, protein sequence motif that interacts with other proteins. A typical example of such motif are proline-rich sequences that are responsible for binding of SH3 domains. This class of proteins containing these motifs are usually ligands of the domain...
- Eukaryotic Linear Motif
The Eukaryotic Linear Motif resource is an initiative by Dr Toby Gibson and colleagues at the European Molecular Biology Laboratory to describe local sequence features.- See also :* home page...
- Protein
Proteins are organic compounds made of amino acids arranged in a linear chain and folded into a globular form. The amino acids in a polymer chain are joined together by the peptide bonds between the carboxyl and amino groups of adjacent amino acid residues...
- Protein structure
Proteins are an important class of biological macromolecules present in all biological organisms, made up of such elements as carbon, hydrogen, nitrogen, oxygen, and sulphur. All proteins are polymers of amino acids. According to their physical size, proteins are nanoparticles...
- Protein structure prediction
Protein structure prediction is the prediction of the three-dimensional structure of a protein from its amino acid sequence—that is, the prediction of a protein's tertiary structure from its primary structure. It is one of the most important goals pursued by bioinformatics and theoretical...
- Protein family
A protein family is a group of evolutionarily related proteins, and is often nearly synonymous with gene family. The term protein family should not be confused with family as it is used in taxonomy....
- Structural biology
Structural biology is a branch of molecular biology, biochemistry, and biophysics concerned with the molecular structure of biological macromolecules, especially proteins and nucleic acids, how they acquire the structures they have, and how alterations in their structures affect their function...
- Structural Classification of Proteins
The Structural Classification of Proteins database is a largely manual classification of protein structural domains based on similarities of their amino acid sequences and three-dimensional structures....
(SCOP)
Structural domain databases
Sequence domain databases
External links
- The Protein Families (Pfam) database clan browser provides easy access to information about protein structural domains. A clan contains two or more Pfam families that have arisen from a single evolutionary origin.
Key papers
- Bastian, H. C. (1872). The beginnings of life: being some account of the nature, modes of origin and transformation of lower organisms. Macmillan and Co., England.
- Branden, C.-I. and Tooze, J. (1991). Introduction to protein structure. Garland, New York.
- George, R. A. (2002) "Predicting Structural Domains in Proteins". Thesis, University College London