Bioinformatic Harvester
Encyclopedia
The Bioinformatic Harvester is a bioinformatic meta search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...

 at KIT Karlsruhe Institute of Technology
Karlsruhe Institute of Technology
The Karlsruhe Institute of Technology is a German academic research and education institution with university status resulting from a merger of the university and the research center of the city of Karlsruhe. The university, also known as Fridericiana, was founded in 1825...

 for gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s and protein-associated information. Harvester currently works for human
Human
Humans are the only living species in the Homo genus...

, mouse
Mouse
A mouse is a small mammal belonging to the order of rodents. The best known mouse species is the common house mouse . It is also a popular pet. In some places, certain kinds of field mice are also common. This rodent is eaten by large birds such as hawks and eagles...

, rat
Rat
Rats are various medium-sized, long-tailed rodents of the superfamily Muroidea. "True rats" are members of the genus Rattus, the most important of which to humans are the black rat, Rattus rattus, and the brown rat, Rattus norvegicus...

, zebrafish, drosophila
Drosophila
Drosophila is a genus of small flies, belonging to the family Drosophilidae, whose members are often called "fruit flies" or more appropriately pomace flies, vinegar flies, or wine flies, a reference to the characteristic of many species to linger around overripe or rotting fruit...

 and arabidopsis thaliana
Arabidopsis thaliana
Arabidopsis thaliana is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, arabidopsis is popular as a model organism in plant biology and genetics...

 based information. Harvester cross-links >50 popular bioinformatic resources and allows cross searches. Harvester serves 10.000s of pages every day to scientists and physicians.

How Harvester works

Harvester collects information from protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

 and gene databases along with information from so called "prediction servers." Prediction server e.g. provide online sequence analysis for a single protein. Harvesters search index is based on the IPI
International Protein Index
The International Protein Index is database that was created to give the proteomics community a resource that enables* accession numbers from a variety of bioinformatics databases to be mapped* a complete set of proteins for a species i.e...

 and UniProt
UniProt
UniProt is a comprehensive, high-quality and freely accessible database of protein sequence and functional information, many of which are derived from genome sequencing projects...

 protein information collection. The collections consists of:
  • ~72.000 human, ~57.000 mouse, ~41.000 rat, ~51.000 zebrafish, ~35.000 arabidopsis protein pages, which cross-link ~50 major bioinfiormatic resources.


]

Text based information

...from the following databases:
  • UniProt
    UniProt
    UniProt is a comprehensive, high-quality and freely accessible database of protein sequence and functional information, many of which are derived from genome sequencing projects...

    , world largest protein database
  • SOURCE
    Source
    Source may refer to:-Research:* Source text, in research , a source of information referred to by citation** Primary source, firsthand written evidence of history made at the time of the event by someone who was present...

    , convenient gene information overview
  • Simple Modular Architecture Research Tool
    Simple Modular Architecture Research Tool
    Simple Modular Architecture Research Tool is a classification scheme used in the identification and analysis of protein domains....

     (SMART),
  • SOSUI
    SOSUI
    SOSUI is a free online tool that predicts a part of the secondary structure of proteins from a given amino acid sequence . The main objective is to determine whether the protein in question is a soluble or a transmembrane protein.-History:...

    , predicts transmembrane domains
  • PSORT
    PSORT
    PSORT is a bioinformatics tool used for the prediction of protein localisation sites in cells. It receives the information of an amino acid sequence and its species of origin, e.g. Gram-negative bacteria as inputs. Then it analyses the input sequence by applying the stored rules for various...

    , predicts protein localisation
  • HomoloGene
    Homologene
    HomoloGene, a tool of the National Center for Biotechnology Information , is a system for automated detection of homologs among the annotated genes of several completely sequenced eukaryotic genomes.The HomoloGene processing consists of the protein analysis from the input organisms...

    , compares proteins from different species
  • gfp-cdna
    Gfp-cdna
    The GFP-cDNA project documents the localisation of proteins to subcellular compartments of the eukaryotic cell applying fluorescence microscopy. Experimental data are complemented with bioinformatic analyses and published online in a database. A search function allows the finding of proteins...

    , protein localisation with fluorescence microscopy
  • International Protein Index
    International Protein Index
    The International Protein Index is database that was created to give the proteomics community a resource that enables* accession numbers from a variety of bioinformatics databases to be mapped* a complete set of proteins for a species i.e...

     (IPI).

Databases rich in graphical elements

...are not collected, but crosslinked via iframe
IFrame
iFrame can be:* I-frames, in video compression; see video compression picture types* iFrame * The HTML iframe element....

s. Iframes are transparent windows within a HTML
HTML
HyperText Markup Language is the predominant markup language for web pages. HTML elements are the basic building-blocks of webpages....

 pages. The iframe windows allows up-to-date viewing of the "iframed," linked databases. Several such iframes are combined on a Harvester protein page. This method allows convenient comparison of information from several databases.
  • NCBI-BLAST
    BLAST
    In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences...

    , an algorithm for comparing biological sequences from the NCBI
    National Center for Biotechnology Information
    The National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...

    .
  • Ensembl
    Ensembl
    Ensembl is a joint scientific project between the European Bioinformatics Institute and the Wellcome Trust Sanger Institute, which was launched in 1999 in response to the imminent completion of the Human Genome Project...

    , automatic gene annotation by the EMBL-EBI
    European Bioinformatics Institute
    The European Bioinformatics Institute is a centre for research and services in bioinformatics, and is part of European Molecular Biology Laboratory...

     and Sanger Institute
    Sanger Institute
    The Wellcome Trust Sanger Institute is a non-profit, British genomics and genetics research institute, primarily funded by the Wellcome Trust....

  • FlyBase
    FlyBase
    FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, Drosophila melanogaster, a wide range of data are presented in different formats...

     is a database of model organism Drosophila melanogaster
    Drosophila melanogaster
    Drosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is known generally as the common fruit fly or vinegar fly. Starting from Charles W...

    .
  • GoPubMed
    GoPubMed
    GoPubMed is a knowledge-based search engine for biomedical texts. TheGene Ontology and Medical Subject Headings serve as "Table of contents" in order to structure the millions of articles of the MEDLINE database. The search engine allows its users to find relevant search results significantly...

     is a knowledge-based search engine for biomedical texts.
  • iHOP
    Information Hyperlinked over Proteins
    Information Hyperlinked over Proteins is an online service that provides a gene-guided network to access PubMed abstracts. By using genes and proteins as hyperlinks between sentences and abstracts, the information in PubMed can be converted into one navigable resource.Navigating across...

    , information hyperlinked over proteins via gene/protein synonyms
  • Mendelian Inheritance in Man
    Mendelian Inheritance in Man
    Online Mendelian Inheritance in Man is a database that catalogues all the known diseases with a genetic component, and—when possible—links them to the relevant genes in the human genome and provides references for further research and tools for genomic analysis of a catalogued gene. OMIM is one...

     project catalogues all the known diseases.
  • RZPD, German resources Center for genome research in Berlin/Heidelberg.
  • STRING
    STRING
    In molecular biology, STRING is a database and web resource of known and predicted protein-protein interactions....

    , Search Tool for the Retrieval of Interacting Genes/Proteins, developed by EMBL, SIB
    Swiss Institute of Bioinformatics
    The Swiss Institute of Bioinformatics is an academic not-for-profit foundation which federates bioinformatics activities throughout Switzerland...

     and UZH
    University of Zurich
    The University of Zurich , located in the city of Zurich, is the largest university in Switzerland, with over 25,000 students. It was founded in 1833 from the existing colleges of theology, law, medicine and a new faculty of philosophy....

    .
  • Zebrafish Information Network
    Zebrafish Information Network
    The Zebrafish Information Network is an online biological database of information about the zebrafish . The zebrafish is a widely used model organism for genetic, genomic, and developmental studies, and ZFIN provides an integrated interface for querying and displaying the large volume of data...

    .
  • LOCATE subcellular localization database (mouse).

"linkouts"

  • Genome browser
    Genome browser
    A genome browser is a graphical interface for display of information from a biological database for genomic data. Genome browsers enable researchers to visualize and browse entire genomes with annotated data including gene prediction and structure, proteins, expression, regulation, variation,...

    , working draft assemblies for genomes UCSC
  • Google Scholar
    Google Scholar
    Google Scholar is a freely accessible web search engine that indexes the full text of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes most peer-reviewed online journals of Europe and America's largest...

  • Mitocheck
    Mitocheck
    MitoCheck was an integrated research project which brought together several European research groups to study systematically the regulation of mitosis in human cells....

  • PolyMeta, meta search engine for Google, Yahoo, MSN, Ask, Exalead, AllTheWeb, GigaBlast

What one can find

Harvester allows a combination of different search terms and single words.

Search Examples:
  • Gene-name: "golga3"
  • Gene-alias: "ADAP-S ADAS ADHAPS ADPS" (one gene name is sufficient)
  • Gene-Ontologies: "Enzyme linked receptor protein signaling pathway"
  • Unigene
    UniGene
    UniGene is an NCBI database of the transcriptome and thus, despite the name, not primarily a database for genes. Each entry is a set of transcripts that appear to stem from the same transcription locus...

    -Cluster: "Hs.449360"

  • Go-annotation: "intra-Golgi transport"
  • Molecular function: "protein kinase binding"
  • Protein: "Q9NPD3"
  • Protein domain: "SH2 sar"
  • Protein Localisation: "endoplasmic reticulum"

  • Chromosome: "2q31"
  • Disease relevant: use the word "diseaselink"
  • Combinations: "golgi diseaselink" (finds all golgi proteins associated with a disease)
  • mRNA: "AL136897"

  • Word: "Cancer"
  • Comment: "highly expressed in heart"
  • Author: "Merkel, Schmidt"
  • Publication or project: "cDNA sequencing project"

See also

  • Biological database
    Biological database
    Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analyses. They contain information from research areas including genomics, proteomics, metabolomics, microarray...

    s
  • Entrez
    Entrez
    The Entrez Global Query Cross-Database Search System is a powerful federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information website...

  • European Bioinformatics Institute
    European Bioinformatics Institute
    The European Bioinformatics Institute is a centre for research and services in bioinformatics, and is part of European Molecular Biology Laboratory...

  • Human Protein Reference Database
    HPRD
    The Human Protein Reference Database is a protein database accessible through the internet.The HPRD is a result of an international collaborative effort between the in Bangalore, India and the at Johns Hopkins University in Baltimore, USA. HPRD contains manually curated scientific information...

  • Metadata
    Metadata
    The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...

  • Sequence profiling tool
    Sequence profiling tool
    A sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to...


External links

  • http://harvester.kit.edu Bioinformatic Harvester V at KIT Karlsruhe Institute of Technology
    Karlsruhe Institute of Technology
    The Karlsruhe Institute of Technology is a German academic research and education institution with university status resulting from a merger of the university and the research center of the city of Karlsruhe. The university, also known as Fridericiana, was founded in 1825...

  • Harvester42 at KIT - integrating 50 general search engines
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK