Xrate
Encyclopedia
XRATE is a program for prototyping phylogenetic hidden Markov models and stochastic context-free grammars
Stochastic context-free grammar
A stochastic context-free grammar is a context-free grammar in which each production is augmented with a probability...

.
It is used to discover patterns of evolutionary conservation in sequence alignments
Sequence alignment
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are...

.
The program can be used to estimate parameters for such models from "training" alignment data,
or to apply the parameterized model so as to annotate new alignments.
The program allows specification of a variety of models of DNA sequence evolution
Models of DNA evolution
A number of different Markov models of DNA sequence evolution have been proposed. These substitution models differ in terms of the parameters used to describe the rates at which one nucleotide replaces another during evolution. These models are frequently used in molecular phylogenetic analyses...

 which may be arbitrarily organized using formal grammars
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...

.

As an example of how XRATE is used, consider a protein-coding gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

 consisting of exons interspersed with introns.
The exons contain triplets of nucleotides (codons) that are translated by ribosomes according to the genetic code
Genetic code
The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....

, and consequently are under selection pressure
(since any mutation may affect the translated amino acid sequence).
In contrast, the introns are under fewer selective constraints and tend to evolve faster.
These varying pressures show up clearly in multiple alignments
Multiple sequence alignment
A multiple sequence alignment is a sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. In many cases, the input set of query sequences are assumed to have an evolutionary relationship by which they share a lineage and are descended from a common ancestor...

.
The sequential layout of introns and exons can be described using grammar theory
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...

 (from linguistics)
and each of their distinct evolutionary signatures modeled as a continuous-time Markov process.
XRATE allows the user to specify such models in a configuration file and estimate their parameters (evolutionary rates, length distributions of exons and introns, etc.)
directly from alignment data, using the Expectation-maximization algorithm
Expectation-maximization algorithm
In statistics, an expectation–maximization algorithm is an iterative method for finding maximum likelihood or maximum a posteriori estimates of parameters in statistical models, where the model depends on unobserved latent variables...

.

XRATE can be downloaded as part of the DART software package. It accepts input files in Stockholm format
Stockholm format
Stockholm format is a Multiple sequence alignment format used by Pfam and Rfam to disseminate protein and RNA sequence alignments. The alignment editors...

.

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK