Enzyme Function Initiative
Encyclopedia
The Enzyme Function Initiative (EFI) is a large scale collaborative project aiming to develop and disseminate a robust strategy to determine enzyme
Enzyme
Enzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...

 function through an integrated sequence-structure based approach. The project was funded in May 2010 by the National Institute of General Medical Sciences
National Institute of General Medical Sciences
The National Institute of General Medical Sciences is a part of the National Institutes of Health that primarily supports research that lays the foundation for advances in disease diagnosis, treatment and prevention...

 as a Glue Grant which supports the research of complex biological problems that cannot be solved by a single research group. The EFI was largely spurred by the need to develop methods to identify the functions of the enormous number proteins discovered through genomic
Genomics
Genomics is a discipline in genetics concerning the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis,...

 sequencing projects.

Motivation

The dramatic increase in genomic sequencing projects has caused the number of protein sequences
Peptide sequence
Peptide sequence or amino acid sequence is the order in which amino acid residues, connected by peptide bonds, lie in the chain in peptides and proteins. The sequence is generally reported from the N-terminal end containing free amino group to the C-terminal end containing free carboxyl group...

 deposited into public databases to grow exponentially. To cope with the influx of sequences, databases use computational predictions to auto-annotate individual protein's functions. While these computational methods offer the advantages of being extremely high-throughput and generally provide accurate broad classifications, exclusive use has led to a significant level of misannotation of enzyme function in protein databases. Thus although the information now available represents an unprecedented opportunity to understand cellular metabolism across a wide variety of organisms, which includes the ability to identify molecules and/or reactions that may benefit human quality of life, the potential has not been fully actualized. The biological community's ability to characterize newly discovered proteins has been outstripped by the rate of genome sequencing, and the task of assigning function is now considered the rate-limiting step in understanding biological systems in detail.

Integrated Strategy for Functional Assignment

The EFI is developing an integrated sequence-structure based strategy for functional assignment by predicting the substrate specificities of unknown members of mechanistically diverse enzyme superfamilies
Protein family
A protein family is a group of evolutionarily-related proteins, and is often nearly synonymous with gene family. The term protein family should not be confused with family as it is used in taxonomy....

. The approach leverages conserved features within a given superfamily such as known chemistry, identity of active site
Active site
In biology the active site is part of an enzyme where substrates bind and undergo a chemical reaction. The majority of enzymes are proteins but RNA enzymes called ribozymes also exist. The active site of an enzyme is usually found in a cleft or pocket that is lined by amino acid residues that...

 functional groups, and composition of specificity-determining residues, motifs, or structures to predict function but replies on multidisciplinary expertise to streamline, refine, and test the predictions. The integrated sequence-strategy under development will be generally applicable to deciphering the ligand specificities of any functionally unknown protein.

Organization

By NIGMS program mandate, Glue Grant consortia must contain Core Resources and Bridging Projects. The EFI consists of six Scientific Cores which provide bioinformatic, structural, computational, and data management expertise to facilitate functional predictions for enzymes of unknown function targeted by the EFI. These predictions are then tested by five Bridging Projects representing the amidohydrolase, enolase, GST, HAD, and isoprenoid synthase enzyme superfamilies.

Scientific Cores

The Superfamily/Genome Core contributes bioinformatic analysis by collecting and curating complete sequence data sets, generating sequence similarity networks, and classification of superfamily members into subgroups and families for subsequent annotation transfer and evaluation as targets for functional characterization.

The Protein Core develops cloning, expression, and protein purification
Protein purification
Protein purification is a series of processes intended to isolate a single type of protein from a complex mixture. Protein purification is vital for the characterization of the function, structure and interactions of the protein of interest. The starting material is usually a biological tissue or...

 strategies for the enzymes targeted for study.

The Structure Core fulfills the structural biology
Structural biology
Structural biology is a branch of molecular biology, biochemistry, and biophysics concerned with the molecular structure of biological macromolecules, especially proteins and nucleic acids, how they acquire the structures they have, and how alterations in their structures affect their function...

 component for EFI by providing high resolution structures of targeted enzymes.

The Computation Core performs in silico docking to generate rank-ordered lists of predicted substrates for targeted enzymes using both experimentally determined and/or homology modeled protein structures.

The Microbiology Core examines in vivo
In vivo
In vivo is experimentation using a whole, living organism as opposed to a partial or dead organism, or an in vitro controlled environment. Animal testing and clinical trials are two forms of in vivo research...

functions using genetic techniques and metabolomics
Metabolomics
Metabolomics is the scientific study of chemical processes involving metabolites. Specifically, metabolomics is the "systematic study of the unique chemical fingerprints that specific cellular processes leave behind", the study of their small-molecule metabolite profiles...

 to compliment in vitro
In vitro
In vitro refers to studies in experimental biology that are conducted using components of an organism that have been isolated from their usual biological context in order to permit a more detailed or more convenient analysis than can be done with whole organisms. Colloquially, these experiments...

functions determined by the Bridging Projects.

The Data and Dissemination Core maintains two complementary public databases for bioinformatic (Structure-Function Linkage Database) and experimental data (EFI-DB).

Bridging Projects

The amidohydrolase superfamily
Amidohydrolase
Amidohydrolases are a type of hydrolase that acts upon amide bonds.They are categorized under EC number EC 3.5.1 and 3.5.2.Examples include:* Beta-lactamase* Histone deacetylase* Urease...

 contains evolutionarily related enzymes with a distorted (β/α)8 barrel fold which primarily catalyze metal-assisted deamination, decarboxylation, isomerization, hydration, or retroaldol cleavage reactions.

The enolase superfamily
Enolase superfamily
The reactions catalyzed by enzymes in the enolase superfamily share the core chemical step of an abstraction of a proton from a carbon adjacent to a carboxylic acid and a requirement of a divalent metal ion...

 contains evolutionarily related enzymes with a (β/α)7β‑barrel (TIM‑barrel) fold which primarily catalyze metal-assisted epimerization/racemization or β-elimination of carboxylate substrates.

The GST superfamily
Glutathione S-transferase
Enzymes of the glutathione S-transferase family are composed of many cytosolic, mitochondrial, and microsomal proteins. GSTs are present in eukaryotes and in prokaryotes, where they catalyze a variety of reactions and accept endogenous and xenobiotic substrates.GSTs can constitute up to 10% of...

 contains evolutionarily related enzymes with a modified thioredoxin fold and an additional all α-helical domain which primarily catalyze nucleophilic attack of reduced glutathione (GSH) on electrophlic substrates.

The HAD superfamily contains evolutionarily related enzymes with a Rosmmannoid α/β fold with an inserted “cap” region which primarily catalyze metal-assisted nucleophilic catalysis, most frequently resulting in phosphoryl group transfer.

The isoprenoid synthase (I) superfamily contains evolutionarily related enzymes with a mostly all α-helical fold and primarily catalyze trans-prenyl transfer reactions to form elongated or cyclized isoprene
Isoprene
Isoprene , or 2-methyl-1,3-butadiene, is a common organic compound with the formula CH2=CCH=CH2. Under standard conditions it is a colorless liquid...

products.

Participating Investigators

Fourteen investigators with expertise in various disciplines make up the EFI.
Name Institution Role
Gerlt, John A. University of Illinois, Urbana-Champaign EFI Director, Director of the Enolase Bridging Project
Allen, Karen N. Boston University Co-Director of the HAD Bridging Project
Almo, Steven C. Albert Einstein College of Medicine Director of the Protein, Director of the Structure Core
Armstrong, Richard N. Vanderbilt University School of Medicine Director of the GST Bridging Project
Babbitt, Patricia C. University of California, San Francisco Director of the Superfamily/Genome Core, Co-Director of the Data and Dissemination Core
Cronan, John E. University of Illinois, Urbana-Champaign Co-Director of the Microbiology Core
Dunaway‑Mariano, Debra University of New Mexico Co-Director of the HAD Bridging Project
Jacobson, Matthew P. University of California, San Francisco Co-Director of the Computation Core
Minor, Wladek University of Virginia Co-Director of Data and Dissemination Core
Poulter, C. Dale University of Utah Director of the Isoprenoid Synthase Bridging Project
Raushel, Frank M. Texas A&M University Director of the Amidohydrolase Bridging Project
Sali, Andrej University of California, San Francisco Co-Director of the Computation Core
Shoichet, Brian K. University of California, San Francisco Co-Director of the Computation Core
Sweedler, Jonathan V. University of Illinois, Urbana-Champaign Co-Director of the Microbiology Core

Deliverables

The EFI's primary deliverable is development and dissemination of an integrated sequence/structure strategy for functional assignment. As the strategy is developed, data and clones generated by the EFI are made freely available via several online resources.

Funding

The EFI was established in May 2010 with $33.9 million in funding over a 5-year period (grant number GM093342). Pending project success and assessment of the Glue Grant funding mechanism, the grant may be renewed for an additional 5 years in 2014.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK