Phylogenetic profiling
Encyclopedia
Phylogenetic profiling is an important and elegant bioinformatics
Bioinformatics
Bioinformatics is the application of computer science and information technology to the field of biology and medicine. Bioinformatics deals with algorithms, databases and information systems, web technologies, artificial intelligence and soft computing, information and computation theory, software...

 technique in which the joint presence or joint absence of two traits across a similar distribution of species is used to infer a meaningful biological connection, such as involvement of two different proteins in the same biological pathway. Along with examination of conserved synteny
Synteny
In classical genetics, synteny describes the physical co-localization of genetic loci on the same chromosome within an individual or species. The concept is related to genetic linkage: Linkage between two loci is established by the observation of lower-than-expected recombination frequencies...

, conserved operon
Operon
In genetics, an operon is a functioning unit of genomic DNA containing a cluster of genes under the control of a single regulatory signal or promoter. The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo trans-splicing to create...

 structure, or "Rosetta Stone" domain fusions
Fusion gene
A fusion gene is a hybrid gene formed from two previously separate genes. It can occur as the result of a translocation, interstitial deletion, or chromosomal inversion...

, comparing phylogenetic profiles is designated a "post-homology" technique, in that the computation essential to the method begins after it is determined which proteins are homologous to which. A number of these techniques were developed by David Eisenberg
David Eisenberg
David S. Eisenberg is an American biochemist best known for his contributions to structural and computational molecular biology...

 and colleagues; phylogenetic profile comparison was introduced in 1999 by Pellegrini, et al.

Method

Over 2000 organisms of Bacteria
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...

, Archaea
Archaea
The Archaea are a group of single-celled microorganisms. A single individual or species from this domain is called an archaeon...

, and Eukaryotes now are represented by complete DNA genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

 sequences. Typically, each gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

 in a genome encodes a protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

 that can be assigned to a particular protein family
Protein family
A protein family is a group of evolutionarily-related proteins, and is often nearly synonymous with gene family. The term protein family should not be confused with family as it is used in taxonomy....

 on the basis of homology
Homology (biology)
Homology forms the basis of organization for comparative biology. In 1843, Richard Owen defined homology as "the same organ in different animals under every variety of form and function". Organs as different as a bat's wing, a seal's flipper, a cat's paw and a human hand have a common underlying...

. For a given protein family, its presence or absence in each genome (in the original formulation) is represented by 1 (present) or 0 (absent). Consequently, the phylogenetic distribution of the protein family can be represented by a long binary number with a digit for each genome; such binary representations are readily compared with each other to show correlated phylogenetic distributions. The large number of complete genomes makes these profiles rich in information. The advantage of using only complete genomes is that the 0 values, representing the absence of a trait, tend to be reliable. Presence or absence of a protein is usually determined by sequence similarity.

Theory

Closely related species should be expected to have very similar sets of genes. However, changes accumulate between more distantly related species by processes that include horizontal gene transfer
Horizontal gene transfer
Horizontal gene transfer , also lateral gene transfer , is any process in which an organism incorporates genetic material from another organism without being the offspring of that organism...

 and gene loss. Individual proteins have specific molecular functions, such as carrying out a single enzymatic reaction or serving as one subunit of a larger protein complex. A biological process such as photosynthesis
Photosynthesis
Photosynthesis is a chemical process that converts carbon dioxide into organic compounds, especially sugars, using the energy from sunlight. Photosynthesis occurs in plants, algae, and many species of bacteria, but not in archaea. Photosynthetic organisms are called photoautotrophs, since they can...

, methanogenesis
Methanogenesis
Methanogenesis or biomethanation is the formation of methane by microbes known as methanogens. Organisms capable of producing methane have been identified only from the domain Archaea, a group phylogenetically distinct from both eukaryotes and bacteria, although many live in close association with...

, or histidine
Histidine
Histidine Histidine, an essential amino acid, has a positively charged imidazole functional group. It is one of the 22 proteinogenic amino acids. Its codons are CAU and CAC. Histidine was first isolated by German physician Albrecht Kossel in 1896. Histidine is an essential amino acid in humans...

 biosynthesis may require the concerted action of many proteins. If some protein critical to such a process were lost, other proteins dedicated to that process would become useless; natural selection
Natural selection
Natural selection is the nonrandom process by which biologic traits become either more or less common in a population as a function of differential reproduction of their bearers. It is a key mechanism of evolution....

 makes it unlikely they will be retained over evolutionary time. Therefore, should two different protein families tend always to be either both present or both absent, a likely hypothesis
Hypothesis
A hypothesis is a proposed explanation for a phenomenon. The term derives from the Greek, ὑποτιθέναι – hypotithenai meaning "to put under" or "to suppose". For a hypothesis to be put forward as a scientific hypothesis, the scientific method requires that one can test it...

is that the two proteins cooperate in some biological process.

Advances and Challenges

Phylogenetic profiling has led to numerous discoveries in biology, including previously unknown enzymes in metabolic pathways, transcription factors that bind to conserved regulatory sites, and explanations for roles of certain mutations in human disease . Improving the method itself is an active area of research because the method itself faces several limitations. First, co-occurrence of two protein families often represents recent common ancestry of two species rather than a conserved functional relationship; disambiguating these two sources of correlation may require improved statistical methods. Second, proteins grouped as homologs may differ in function, or proteins conserved in function may fail to register as homologs; improved methods for tailoring the size of each protein family to reflect functional conservation will lead to improved results.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK