Velvet (algorithm)
Encyclopedia
Velvet is a set of algorithms manipulating de Bruijn graph
De Bruijn graph
In graph theory, an n-dimensional De Bruijn graph of m symbols is a directed graph representing overlaps between sequences of symbols. It has mn vertices, consisting of all possible length-n sequences of the given symbols; the same symbol may appear multiple times in a sequence...

s for genomic and de novo transcriptomic
De novo transcriptome assembly
De novo transcriptome assembly is the method of creating a transcriptome without the aid of a reference genome.- Introduction :Before de novo transcriptome assembly, transcriptome information was only readily available for a handful of model organisms utilized by the international scientific...

 Sequence assembly
Sequence assembly
In bioinformatics, sequence assembly refers to aligning and merging fragments of a much longer DNA sequence in order to reconstruct the original sequence. This is needed as DNA sequencing technology cannot read whole genomes in one go, but rather reads small pieces of between 20 and 1000 bases,...

. It was designed for short read sequencing technologies, such as Solexa or 454 Sequencing and was developed by Daniel Zerbino and Ewan Birney
Ewan Birney
Ewan Birney is a senior scientist at the European Bioinformatics Institute and joint head of the Protein And Nucleic Acids group with Rolf Apweiler. The PANDA group is responsible for the widely used Ensembl genome browser, and highly-cited research on, for example, sequence analysis tools...

 at the European Bioinformatics Institute
European Bioinformatics Institute
The European Bioinformatics Institute is a centre for research and services in bioinformatics, and is part of European Molecular Biology Laboratory...

. The tool takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs. It has also been implemented inside the commercial package Geneious Server
Geneious
Geneious is suite of cross-platform bioinformatics software applications developed by Biomatters Ltd.- Features :Geneious comes in a Basic version that is free for academic use, and a commercial Pro version with added features. Geneious bundles various bioinformatics tools under one hood with an...

.

The de Bruijn graph

For each k-mer observed (and its reverse complement) in the set of reads, the hash table records the ID of the first read encountered containing that k-mer and the position of its occurrence within that read.
A second database is created with the opposite information:short read -> original k-mers are overlapped by subsequent reads.

Simplification

Whenever a node A has only one outgoing arc that points to another node B that has only one ingoing arc, the two nodes are merged.

Error removal

Errors can be due to both the sequencing process or to the polymorphisms.
  • Removing the “tips”:a chain of nodes that is disconnected on one end.
  • Removing bubbles with the Tour Bus algorithm
  • Removing erroneous connections
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK