SIMAP
Encyclopedia
Similarity Matrix of Proteins, or SIMAP, is a database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...

 of protein similarities created using distributed computing, which is freely accessible for scientific purposes. SIMAP uses the FASTA
FASTA
FASTA is a DNA and protein sequence alignment software package first described by David J. Lipman and William R. Pearson in 1985. Its legacy is the FASTA format which is now ubiquitous in bioinformatics.- History :...

 algorithm to precalculate protein similarity, while another application uses hidden Markov model
Hidden Markov model
A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. An HMM can be considered as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E...

s to search for Protein domain
Protein domain
A protein domain is a part of protein sequence and structure that can evolve, function, and exist independently of the rest of the protein chain. Each domain forms a compact three-dimensional structure and often can be independently stable and folded. Many proteins consist of several structural...

s.

SIMAP is a joint project of the Technical University of Munich
Technical University of Munich
The Technische Universität München is a research university with campuses in Munich, Garching, and Weihenstephan...

, the Helmholtz Zentrum München
Helmholtz Zentrum München
The Helmholtz Zentrum München is a member of the Helmholtz Association of German Research Centres and is responsible for studying environmental health issues. Founded in 1964, it is a joint project of the Federal Ministry of Education and Research and Bavaria's Finance Ministry...

, and the University of Vienna
University of Vienna
The University of Vienna is a public university located in Vienna, Austria. It was founded by Duke Rudolph IV in 1365 and is the oldest university in the German-speaking world...

.

The project usually gets new work units at the beginning of each month. More recently, (2010), inclusion of environmental sequences into the database has required longer periods of activity, several months of continuous work for example. Typically, these updates occur twice each year.

In the fourth quarter of 2010, the project relocated to the University of Vienna
University of Vienna
The University of Vienna is a public university located in Vienna, Austria. It was founded by Duke Rudolph IV in 1365 and is the oldest university in the German-speaking world...

 due to the failing electrical infrastructure at the Technical University of Munich. Part of this exercise involved the creation of a project specific URL requiring existing volunteers and users to detach/reattach to the project.

Computing platform

SIMAP uses the Berkeley Open Infrastructure for Network Computing
Berkeley Open Infrastructure for Network Computing
The Berkeley Open Infrastructure for Network Computing is an open source middleware system for volunteer and grid computing. It was originally developed to support the SETI@home project before it became useful as a platform for other distributed applications in areas as diverse as mathematics,...

 (BOINC) distributed computing
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...

 platform.

Application performance notes:
  • Work unit CPU times can vary widely, ranging between 15 minutes and 3 hours.
  • Work units are around 600 kB
    Kilobyte
    The kilobyte is a multiple of the unit byte for digital information. Although the prefix kilo- means 1000, the term kilobyte and symbol KB have historically been used to refer to either 1024 bytes or 1000 bytes, dependent upon context, in the fields of computer science and information...

     to 1.35 MB
    Megabyte
    The megabyte is a multiple of the unit byte for digital information storage or transmission with two different values depending on context: bytes generally for computer memory; and one million bytes generally for computer storage. The IEEE Standards Board has decided that "Mega will mean 1 000...

     each, averaging around 1.20 MB.
  • SIMAP provides client software optimized for SSE
    Streaming SIMD Extensions
    In computing, Streaming SIMD Extensions is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! . SSE contains 70 new instructions, most of which work on single precision floating point...

     enabled processors and x86-64
    X86-64
    x86-64 is an extension of the x86 instruction set. It supports vastly larger virtual and physical address spaces than are possible on x86, thereby allowing programmers to conveniently work with much larger data sets. x86-64 also provides 64-bit general purpose registers and numerous other...

     processors. For older processors non SSE applications are provided but require manual installation steps to be taken. Operating Systems supported by SIMAP are Linux
    Linux
    Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

    , Windows
    Microsoft Windows
    Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

    , Mac OS and other UNIX platforms.
  • Since the database has sometimes been completed with all publicly known protein sequences and metagenomes having been precalculated by the project, the work available consists of newly published protein sequences and metagenomes that need to be precomputed for SIMAP.

See also

  • Grid computing
    Grid computing
    Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...

  • Protein
    Protein
    Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

  • Rosetta@home
    Rosetta@home
    Rosetta@home is a distributed computing project for protein structure prediction on the Berkeley Open Infrastructure for Network Computing platform, run by the Baker laboratory at the University of Washington...

  • Predictor@home
    Predictor@home
    Predictor@home was a distributed computing project that used BOINC.It was established by The Scripps Research Institute to predict protein structure from protein sequence in the context of the 6th biannual CASP, or Critical Assessment of Techniques for Protein Structure Prediction...

  • Folding@home
    Folding@home
    Folding@home is a distributed computing project designed to use spare processing power on personal computers to perform simulations of disease-relevant protein folding and other molecular dynamics, and to improve on the methods of doing so...

  • BOINC

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK