All Topics  
Human Genome Project

 

   Email Print
   Bookmark   Link






 

Human Genome Project



 
 
The Human Genome Project (HGP) was an international scientific research project with a primary goal to determine the sequence of chemical base pairs which make up DNA
DNA

Deoxyribonucleic acid is a nucleic acid that contains the genetics instructions used in the development and functioning of all known living organisms and some viruses....
 and to identify and map the approximately 20,000-25,000 genes
Gênes

G?nes is the name of a d?partement in France of the First French Empire in present Italy. It was named after the city Genoa. It was formed in 1805, when Napoleon Bonaparte occupied the Republic of Genoa....
 of the human genome
Human genome

The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs. Twenty-two of these are autosome, while the remaining pair is XY sex-determination system....
 from both a physical and functional standpoint

The project began in 1990 initially headed by James D. Watson
James D. Watson

James Dewey Watson is an American molecular biology, best known as one of the co-discoverers of the structure of DNA. Watson, Francis Crick, and Maurice Wilkins were awarded the 1962 Nobel Prize in Physiology or Medicine "for their discoveries concerning the molecular structure of nucleic acids and its significance for information transfer...
 at the U.S.






Discussion
Ask a question about 'Human Genome Project'
Start a new discussion about 'Human Genome Project'
Answer questions from other users
Full Discussion Forum



Encyclopedia


Dna Split
The Human Genome Project (HGP) was an international scientific research project with a primary goal to determine the sequence of chemical base pairs which make up DNA
DNA

Deoxyribonucleic acid is a nucleic acid that contains the genetics instructions used in the development and functioning of all known living organisms and some viruses....
 and to identify and map the approximately 20,000-25,000 genes
Gênes

G?nes is the name of a d?partement in France of the First French Empire in present Italy. It was named after the city Genoa. It was formed in 1805, when Napoleon Bonaparte occupied the Republic of Genoa....
 of the human genome
Human genome

The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs. Twenty-two of these are autosome, while the remaining pair is XY sex-determination system....
 from both a physical and functional standpoint

The project began in 1990 initially headed by James D. Watson
James D. Watson

James Dewey Watson is an American molecular biology, best known as one of the co-discoverers of the structure of DNA. Watson, Francis Crick, and Maurice Wilkins were awarded the 1962 Nobel Prize in Physiology or Medicine "for their discoveries concerning the molecular structure of nucleic acids and its significance for information transfer...
 at the U.S. National Institutes of Health
National Institutes of Health

The National Institutes of Health is an agency of the United States Department of Health and Human Services and is the primary agency of the United States government responsible for biomedical and health-related research....
. A working draft of the genome was released in 2000 and a complete one in 2003, with further analysis still being published. A parallel project was conducted outside of government by the Celera Corporation. Most of the government-sponsored sequencing was performed in universities and research centers from the United States, Canada, New Zealand and Britain. The mapping of human genes is an important step in the development of medicines and other aspects of health care.

While the objective of the Human Genome Project is to understand the genetic
Genetics

Genetics , a discipline of biology, is the science of heredity and Genetic variation in living organisms. The fact that living things inherit traits from their parents has been used since prehistoric times to improve crop plants and animals through selective breeding....
 makeup of the human
Human

A human being, also human or man, is a member of a species of bipedalism primates in the family Hominidae . Mitochondrial DNA evidence indicates that modern humans originated in east Africa about 200,000 years ago....
 species, the project also has focused on several other nonhuman organisms such as E. coli, the fruit fly, and the laboratory mouse. It remains one of the largest single investigational projects in modern science.

The HGP originally aimed to map the nucleotides contained in a haploid reference human genome (more than three billion). Several groups have announced efforts to extend this to diploid human genomes including the International HapMap Project
International HapMap Project

The International HapMap Project is an organization whose goal is to develop a haplotype map of the human genome , which will describe the common patterns of human genetic variability....
, Applied Biosystems, Perlegen, Illumina
Illumina (company)

Illumina, Inc. , incorporated in April 1998, develops, manufactures and markets integrated systems for the analysis of genetic variation and biological function....
, JCVI
J. Craig Venter Institute

The J. Craig Venter Institute is a Non-profit organization genomics research institute founded by Craig Venter, Doctor of Philosophy in October 2006....
, Personal Genome Project
Personal Genome Project

The Personal Genome Project aims to publish the complete genomes and medical records of several volunteers, in order to enable research into personalized medicine....
, and Roche-454
454 Life Sciences

454 Life Sciences, a Roche company, is a biotechnology company based in Branford, Connecticut specializing in high-throughput DNA sequencing using a novel massively parallel sequencing-by-synthesis approach....
.

The "genome" of any given individual (except for identical twins and clone
Cloning

Cloning in biology is the process of producing populations of genetically-identical individuals that occurs in nature when organisms such as bacteria, insects or plants reproduce Asexual Reproduction....
d organisms) is unique; mapping "the human genome" involves sequencing multiple variations of each gene. The project did not study the entire DNA found in human cells
List of distinct cell types in the adult human body

There are about 210 known distinct human cell types....
; some heterochromatic
Heterochromatin

Heterochromatin is a tightly packed form of DNA. Its major characteristic is that transcription is limited. As such, it is a means to control gene expression, through regulation of the transcription initiation....
 areas (about 8% of the total) remain un-sequenced.

Project


Background

Initiation of the project was the culmination of several years of work supported by the United States Department of Energy
United States Department of Energy

The United States Department of Energy is a United States Cabinet-level department of the United States government of the United States responsible for Energy policy of the United States and nuclear safety....
, in particular workshops in 1984 and 1986 and a subsequent of the US Department of Energy. This 1987 report stated boldly, "The ultimate goal of this initiative is to understand the human genome" and "knowledge of the human as necessary to the continuing progress of medicine and other health sciences as knowledge of human anatomy has been for the present state of medicine." Candidate technologies were already being considered for the proposed undertaking at least as early as 1985.

James D. Watson
James D. Watson

James Dewey Watson is an American molecular biology, best known as one of the co-discoverers of the structure of DNA. Watson, Francis Crick, and Maurice Wilkins were awarded the 1962 Nobel Prize in Physiology or Medicine "for their discoveries concerning the molecular structure of nucleic acids and its significance for information transfer...
 and Victor Shmerkovich were joint heads of the National Center for Human Genome Research at the National Institutes of Health
National Institutes of Health

The National Institutes of Health is an agency of the United States Department of Health and Human Services and is the primary agency of the United States government responsible for biomedical and health-related research....
 (NIH) in the United States starting from 1988. Largely due to his disagreement with his boss, Bernadine Healy
Bernadine Healy

'Bernadine Patricia Healy' is a cardiologist and a former head of the National Institutes of Health and the American Red Cross. She is a senior writer for U.S....
, over the issue of patenting genes, Watson was forced to resign in 1992. He was replaced by Francis Collins in April 1993, and the name of the Center was changed to the National Human Genome Research Institute
National Human Genome Research Institute

The National Human Genome Research Institute is a division of the National Institutes of Health, located in Bethesda, Maryland.NHGRI began as the National Center for Human Genome Research , which was established in 1989 to carry out the role of the NIH in the International Human Genome Project ....
 (NHGRI) in 1997.

The $3-billion project
Research funding

Research funding is a term generally covering any funding for scientific research, in the areas of both "hard" science and technology and social science....
 was formally founded in 1990 by the United States Department of Energy
United States Department of Energy

The United States Department of Energy is a United States Cabinet-level department of the United States government of the United States responsible for Energy policy of the United States and nuclear safety....
 and the U.S. National Institutes of Health, and was expected to take 15 years. In addition to the United States
United States

The United States of America is a Federal government constitutional republic comprising U.S. state and a federal district. The country is situated mostly in central North America, where its Contiguous United States and Washington, D.C., the Capital districts and territories, lie between the Pacific Ocean and Atlantic Oceans, Borders of the U...
, the international consortium
Consortium

A consortium is an Professional body of two or more individuals, companies, organizations or governments with the objective of participating in a common activity or pooling their resources for achieving a common goal....
 comprised geneticist
Geneticist

A geneticist is a scientist who studies genetics, the science of heredity and genetic variation of organisms. A geneticist can be employed as a researcher or lecturer....
s in China
China

China is a Culture of China, an ancient civilization, and, depending on perspective, a national or multinational entity extending over a large area in East Asia....
, France
France

France , officially the French Republic , is a country whose Metropolitan France is located in Western Europe and that also comprises various Overseas departments and territories of France....
, Germany
Germany

Germany , officially the Federal Republic of Germany , is a country in Central Europe. It is bordered to the north by the North Sea, Denmark, and the Baltic Sea; to the east by Poland and the Czech Republic; to the south by Austria and Switzerland; and to the west by France, Luxembourg, Belgium, and the Netherlands....
, Japan
Japan

Japan is an island country in East Asia. Located in the Pacific Ocean, it lies to the east of the Sea of Japan, People's Republic of China, North Korea, South Korea and Russia, stretching from the Sea of Okhotsk in the north to the East China Sea and Taiwan in the south....
, and the United Kingdom
United Kingdom

The United Kingdom of Great Britain and Northern Ireland, commonly known as the United Kingdom , the UK or Britain,is a sovereign state located off the northwestern coast of continental Europe....
.

Due to widespread international cooperation and advances in the field of genomics
Genomics

Genomics is the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts....
 (especially in sequence analysis
Sequence analysis

The term "sequence analysis" in biology implies subjecting a DNA sequence or peptide sequence to sequence alignment, sequence databases, Repeated Sequences searches, or other bioinformatics methods on a computer....
), as well as major advances in computing technology, a 'rough draft' of the genome was finished in 2000 (announced jointly by then US president Bill Clinton
Bill Clinton

William Jefferson "Bill" Clinton served as the List of Presidents of the United States President of the United States from 1993 to 2001. He was the fifteenth Democrat elected to that office....
 and British
United Kingdom

The United Kingdom of Great Britain and Northern Ireland, commonly known as the United Kingdom , the UK or Britain,is a sovereign state located off the northwestern coast of continental Europe....
 Prime Minister
Prime Minister of the United Kingdom

The Prime Minister of the United Kingdom of Great Britain and Northern Ireland is the political leader of the United Kingdom and the head of government Her Majesty's Government....
 Tony Blair
Tony Blair

Anthony Charles Lynton "Tony" Blair is a British politician, who served as Prime Minister of the United Kingdom from 2 May 1997 to 27 June 2007....
 on June 26, 2000). Ongoing sequencing
Sequencing

In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule....
 led to the announcement of the essentially complete genome
Genome

In classical genetics, the genome of a diploid organism including eukarya refers to a full set of chromosomes or genes in a gamete; thereby, a regular somatic cell contains two full sets of genomes....
 in April 2003, 2 years earlier than planned. In May 2006, another milestone was passed on the way to completion of the project, when the sequence of the last chromosome
Chromosome 1 (human)

Chromosome 1 is the designation for the largest human chromosome. People normally have two copies of chromosome 1, as they do with all of the autosomes, which are the non-sex chromosomes....
 was published in the journal Nature
Nature (journal)

Nature is a prominent scientific journal, first published on 4 November 1869. Although most scientific journals are now highly specialized, Nature is one of the few journals, along with other weekly journals such as Science and Proceedings of the National Academy of Sciences, that still publishes original research articles ac...
.

State of completion


There are multiple definitions of the "complete sequence of the human genome". According to some of these definitions, the genome has already been completely sequenced, and according to other definitions, the genome has yet to be completely sequenced. There have been multiple popular press articles reporting that the genome was "complete." The genome has been completely sequenced using the definition employed by the International Human Genome Project. A of the human genome project shows that most of the human genome was complete by the end of 2003. However, there are a number of regions of the human genome that can be considered unfinished:

  • First, the central regions of each chromosome, known as centromeres, are highly repetitive DNA sequences that are difficult to sequence using current technology. The centromeres are millions (possibly tens of millions) of base pair
    Base pair

    In molecular biology, two nucleotides on opposite complementarity DNA or RNA strands that are connected via hydrogen bonds are called a base pair ....
    s long, and for the most part these are entirely un-sequenced.
  • Second, the ends of the chromosomes, called telomeres, are also highly repetitive, and for most of the 46 chromosome ends these too are incomplete. It is not known precisely how much sequence remains before the telomeres of each chromosome are reached, but as with the centromeres, current technological restraints are prohibitive.
  • Third, there are several loci in each individual's genome that contain members of multigene families that are difficult to disentangle with shotgun sequencing
    Shotgun sequencing

    In genetics, shotgun sequencing, also known as shotgun cloning, is a method used for sequencing long DNA strands. It is named by analogy with the rapidly-expanding, quasi-random firing pattern of a shotgun....
     methods - these multigene families often encode proteins important for immune
    Immune system

    An immune system is a collection of biological processes within an organism that protects against disease by identifying and killing pathogens and tumour cells....
     functions.
  • Other than these regions, there remain a few dozen gaps scattered around the genome, some of them rather large, but there is hope that all these will be closed in the next couple of years.


In summary: the best estimates of total genome size indicate that about 92% of the genome has been completed and it is likely that the centromeres and telomeres will remain un-sequenced until new technology is developed that facilitates their sequencing. Most of the remaining DNA is highly repetitive and unlikely to contain genes, but it cannot be truly known until it is entirely sequenced. Understanding the functions of all the genes and their regulation is far from complete. The roles of junk DNA
Junk DNA

In evolutionary biology and molecular biology, junk DNA is a provisional label for the portions of the DNA sequence of a chromosome or a genome for which no Function has been identified....
, the evolution of the genome, the differences between individuals, and many other questions are still the subject of intense interest by laboratories all over the world.

Goals


The sequence of the human DNA
DNA

Deoxyribonucleic acid is a nucleic acid that contains the genetics instructions used in the development and functioning of all known living organisms and some viruses....
 is stored in database
Database

A database is a structured collection of records or data that is stored in a computer system. The structure is achieved by organizing the data according to a database model....
s available to anyone on the Internet
Internet

The Internet is a global network of interconnected computers, enabling users to share information along multiple channels. Typically, a computer that connects to the Internet can access information from a vast array of available server and other computers by moving information from them to the computer's local memory....
. The U.S. National Center for Biotechnology Information
National Center for Biotechnology Information

The National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health....
 (and sister organizations in Europe and Japan) house the gene sequence in a database known as GenBank
GenBank

The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations....
, along with sequences of known and hypothetical genes and proteins. Other organizations such as the University of California, Santa Cruz, and Ensembl
Ensembl

Ensembl is a joint scientific project between the European_Bioinformatics_Institute and the Sanger_Institute , which was launched in 1999 in response to the imminent completion of the Human_Genome_Project ....
present additional data and annotation and powerful tools for visualizing and searching it. Computer program
Computer program

Computer programs are Instruction for a computer. A computer requires programs to function. Moreover, a computer program does not run unless its instructions are executed by a Central processing unit; however, a program may communicate an Algorithm#Formalization of algorithms to people without running....
s have been developed to analyze the data, because the data themselves are difficult to interpret without such programs.

The process of identifying the boundaries between genes and other features in raw DNA sequence is called genome annotation and is the domain of bioinformatics
Bioinformatics

Bioinformatics is the application of information technology to the field of molecular biology. The term bioinformatics was coined by Paulien Hogeweg in 1978 for the study of informatic processes in biotic systems....
. While expert biologists make the best annotators, their work proceeds slowly, and computer programs are increasingly used to meet the high-throughput demands of genome sequencing projects. The best current technologies for annotation make use of statistical models that take advantage of parallels between DNA sequences and human language
Language

A language is a form of symbol communication in which elements are combined to represents something other than themselves. Language can also refer to the use of such systems as a general phenomenon....
, using concepts from computer science such as formal grammar
Formal grammar

In formal language theory, grammars, also called formal grammars or generative grammars, are a formalism used to describe formal languages – i.e....
s.

Another, often overlooked, goal of the HGP is the study of its ethical, legal, and social implications. It is important to research these issues and find the most appropriate solutions before they become large dilemmas whose effect will manifest in the form of major political concerns.

All humans have unique gene sequences. Therefore the data published by the HGP does not represent the exact sequence of each and every individual's genome. It is the combined genome of a small number of anonymous donors. The HGP genome is a scaffold for future work in identifying differences among individuals. Most of the current effort in identifying differences among individuals involves single nucleotide polymorphism
Single nucleotide polymorphism

A single-nucleotide polymorphism is a DNA sequence variation occurring when a single nucleotide — adenine, thymine, cytosine, or guanine — in the genome differs between members of a species ....
s and the HapMap.

Progress

Almost all the goals that the Human Genome Project has set for itself have been completed earlier than predicted. The Human Genome Project actually exceeded the projected finishing time by two years. The Human Genome Project set a reasonable, attainable goal of 95% of DNA to be sequenced. Not only did the researchers surpass that goal, they shattered their prediction, and were able to sequence 99.99% of a human's DNA . Not only did The Human Genome Project exceed all goals and standards, it still continues making progress on those goals already achieved.

How it was accomplished


Funding from the US government through the National Institutes of Health in the United States, and the UK charity, the Wellcome Trust
Wellcome Trust

The Wellcome Trust was established in 1936 as an independent charity funding research to improve human and animal health. With an endowment of around ?15 billion, it is the United Kingdom's largest non-governmental source of funds for biomedical research....
, who funded the Sanger Institute
Sanger Institute

The Wellcome Trust Sanger Institute is "one of the world's leading genomics centres". The Institute is named after double Nobel Laureate, biochemist, Frederick Sanger....
 (then the Sanger Centre) in Great Britain, as well as numerous other groups from around the world. The genome was broken into smaller pieces; approximately 150,000 base pairs in length. These pieces were then spliced into a type of vector known as "bacterial artificial chromosome
Bacterial artificial chromosome

A 'bacterial artificial chromosome ' is a DNA construct, based on a fertility plasmid , used for Transformation and cloning in bacterium, usually E....
s", or BACs, which are derived from bacterial chromosomes which have been genetically engineered. The vectors containing the genes can be inserted into bacteria where they are copied by the bacterial DNA replication
DNA replication

DNA replication, the basis for heredity, is a fundamental process occurring in all living organisms to copy their DNA. This process is "semiconservative replication" in that each strand of the original double-stranded DNA molecule serves as template for the reproduction of the complementary strand....
 machinery. Each of these pieces was then sequenced separately as a small "shotgun" project and then assembled. The larger, 150,000 base pairs go together to create chromosomes. This is known as the "hierarchical shotgun" approach, because the genome is first broken into relatively large chunks, which are then mapped to chromosomes before being selected for sequencing.

Public versus private approaches


In 1998, a similiar, privately funded quest was launched by the American researcher Craig Venter
Craig Venter

J. Craig Venter is an United States biologist and businessman. Venter founded The Institute for Genomic Research and has been inaccurately credited with being instrumental in mapping the human genome....
, and his firm Celera Genomics
Celera Genomics

Celera Corporation was formerly a business unit of the Applera Corporation, but was spun off in July 2008 2008 to become an independent publicly traded company....
. Venter was a scientist at the NIH during the early 1990s when the project was initiated. The $300 million Celera effort was intended to proceed at a faster pace and at a fraction of the cost of the roughly $3 billion publicly funded project
Research funding

Research funding is a term generally covering any funding for scientific research, in the areas of both "hard" science and technology and social science....
.

Celera used a riskier technique called whole genome shotgun sequencing, which had been used to sequence bacterial genomes of up to six million base pairs in length, but not for anything nearly as large as the three billion base pair human genome.

Celera initially announced that it would seek patent protection on "only 200-300" genes, but later amended this to seeking "intellectual property protection" on "fully-characterized important structures" amounting to 100-300 targets. The firm eventually filed . Celera also promised to publish their findings in accordance with the terms of the 1996 "Bermuda Statement
Bermuda Principles

The Bermuda Principles are a set of agreements made by researchers involved in the sequencing of the human genome during a meeting on the Bermudas in 1996....
," by releasing new data annually (the HGP released its new data daily), although, unlike the publicly funded project, they would not permit free redistribution or commercial use of the data.

In March 2000, President Clinton
Bill Clinton

William Jefferson "Bill" Clinton served as the List of Presidents of the United States President of the United States from 1993 to 2001. He was the fifteenth Democrat elected to that office....
 announced that the genome sequence
Genome

In classical genetics, the genome of a diploid organism including eukarya refers to a full set of chromosomes or genes in a gamete; thereby, a regular somatic cell contains two full sets of genomes....
 could not be patented, and should be made freely available to all researchers. The statement sent Celera's stock plummeting and dragged down the biotechnology
Biotechnology

Biotechnology is technology based on biology, especially when used in agriculture, food science, and medicine. United Nations Convention on Biological Diversity defines biotechnology as:...
-heavy Nasdaq
NASDAQ

The NASDAQ is an United States stock exchange. It is the largest Electronic trading screen-based Stock trading market in the United States....
. The biotechnology sector lost about $50 billion in market capitalization
Market capitalization

Market capitalization/capitalisation is a measurement of corporate or economic wealth equal to the share price times the number of shares outstanding of a public company....
 in two days.

Although the working draft was announced in June 2000, it was not until February 2001 that Celera and the HGP scientists published details of their drafts. Special issues of Nature
Nature (journal)

Nature is a prominent scientific journal, first published on 4 November 1869. Although most scientific journals are now highly specialized, Nature is one of the few journals, along with other weekly journals such as Science and Proceedings of the National Academy of Sciences, that still publishes original research articles ac...
 (which published the publicly funded project's scientific paper
Academic publishing

Academic publishing describes the subfield of publishing which distributes academia research and scholarship. Most academic work is published in Academic journal article, book or thesis form....
) and Science
Science (journal)

Science is the academic journal of the American Association for the Advancement of Science and is considered one of the world's most prestigious scientific journals....
 (which published Celera's paper) described the methods used to produce the draft sequence and offered analysis of the sequence. These drafts covered about 83% of the genome (90% of the euchromatic regions with 150,000 gaps and the order and orientation of many segments not yet established). In February 2001, at the time of the joint publications, press releases
News release

A news release, media release, press release or press statement is a written or recorded communication directed at members of the news media for the purpose of announcing something claimed as having news value....
 announced that the project had been completed by both groups. Improved drafts were announced in 2003 and 2005, filling in to ~92% of the sequence currently.

The competition proved to be very good for the project, spurring the public groups to modify their strategy in order to accelerate progress. The rivals initially agreed to pool their data, but the agreement fell apart when Celera refused to deposit its data in the unrestricted public database GenBank
GenBank

The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations....
. Celera had incorporated the public data into their genome, but forbade the public effort to use Celera data.

HGP is the most well known of many international genome project
Genome project

Genome projects are scientific endeavours that ultimately aim to determine the complete genome sequence of an organism . The genome sequence for any organism requires the DNA sequences for each of the chromosomes in an organism to be determined....
s aimed at sequencing the DNA of a specific organism. While the human DNA sequence
DNA sequence

A DNA sequence or genetic sequence is a succession of letters representing the primary structure of a real or hypothetical DNA molecule or strand, with the capacity to carry information as described by the central dogma of molecular biology....
 offers the most tangible benefits, important developments in biology and medicine are predicted as a result of the sequencing of model organisms, including mice
Mouse

A mouse is a small animal that belongs to one of numerous species of rodents. The best known mouse species is the House Mouse . It is also a popular pet....
, fruit flies, zebrafish
Danio rerio

The zebrafish, Danio rerio, is a tropical freshwater fish belonging to the minnow family . It is a popular Aquarium, frequently sold under the trade name zebra danio, and is an important vertebrate model organism in scientific research....
, yeast
Yeast

Yeasts are eukaryote microorganisms classified in the Kingdom fungus, with about 1,500 species currently described; they dominate fungal diversity in the oceans....
, nematodes, plants
Arabidopsis thaliana

Arabidopsis thaliana , is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, Arabidopsis is popular as a model organism in plant biology and genetics....
, and many microbial organisms and parasites.

In 2004, researchers from the International Human Genome Sequencing Consortium (IHGSC) of the HGP announced a new estimate of 20,000 to 25,000 genes in the human genome. Previously 30,000 to 40,000 had been predicted, while estimates at the start of the project reached up to as high as 2,000,000. The number continues to fluctuate and it is now expected that it will take many years to agree on a precise value for the number of genes in the human genome.

History


In 1976, the genome of the RNA virus
RNA virus

An RNA virus is a virus that has RNA as its genetic material. This nucleic acid is usually single-stranded RNA but may be double-stranded RNA ....
 Bacteriophage MS2
Bacteriophage MS2

The 'bacteriophage MS2'. MS2 phage is an icosahedral bacteriophage with a diameter of 27-34nm and an isoelectric point of 3.9. MS2 phage can be propagated in Escherichia coli, commonly E....
 was the first complete genome to be determined, by Walter Fiers
Walter Fiers

Walter Fiers is a Belgium molecular biologist.He obtained a degree of Engineer for Chemistry and Agricultural Industries at the University of Ghent in 1954, and started his research career as an Enzyme in the laboratory of Laurent Vandendriessche in Ghent....
 and his team at the University of Ghent (Ghent
Ghent

Ghent is a city and a municipality located in the Flemish region, Belgium. It is the capital and biggest city of the East Flanders province. The city started as a settlement at the confluence of the Rivers Scheldt and Lys River and became in the Middle Ages one of the largest and richest cities of northern Europe....
, Belgium
Belgium

* A small German-speaking Community of Belgium exists in eastern Wallonia. Belgium's linguistic diversity and related political and cultural conflicts are reflected in the history of Belgium and a complex Communities and regions of Belgium....
). The idea for the shotgun technique came from the use of an algorithm
Algorithm

In mathematics, computing, linguistics and related subjects, an algorithm is a sequence of finite instructions, often used for calculation and data processing....
 that combined sequence information from many small fragments of DNA to reconstruct a genome. This technique was pioneered by Frederick Sanger
Frederick Sanger

Frederick Sanger, Order of Merit , Order of the Companions of Honour, Order of the British Empire, Royal Society is an England biochemistry and twice a Nobel laureate in chemistry....
 to sequence the genome of the Phage F-X174
Phi-X174 phage

The phi X 174 bacteriophage was the first DNA-based genome to be sequenced. This work was completed by Fred Sanger and his team in 1977. In 1962, Walter Fiers had already demonstrated the physical, covalently closed circularity of phi X 174 DNA....
, a virus (bacteriophage
Bacteriophage

A bacteriophage is any one of a number of viruses that infection bacteria. The term is commonly used in its shortened form, phage.Typically, bacteriophages consist of an outer protein hull enclosing genetic material....
) that primarily infects bacteria that was the first fully sequenced genome (DNA-sequence) in 1977. The technique was called shotgun sequencing
Shotgun sequencing

In genetics, shotgun sequencing, also known as shotgun cloning, is a method used for sequencing long DNA strands. It is named by analogy with the rapidly-expanding, quasi-random firing pattern of a shotgun....
 because the genome was broken into millions of pieces as if it had been blasted with a shotgun. In order to scale up the method, both the sequencing
Sequencing

In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succinctly summarizes much of the atomic-level structure of the sequenced molecule....
 and genome assembly had to be automated, as they were in the 1980s.

Those techniques were shown applicable to sequencing of the first free-living bacterial genome (1.8 million base pairs) of Haemophilus influenzae
Haemophilus influenzae

Haemophilus influenzae, formerly called Pfeiffer's bacillus or Bacillus influenzae, is a non-motile Gram-negative coccobacillus first described in 1892 by Richard Friedrich Johannes Pfeiffer during an influenza pandemic....
 in 1995 and the first animal genome (~100 Mbp) It involved the use of automated sequencers, longer individual sequences using approximately 500 base pairs at that time. Paired sequences separated by a fixed distance of around 2000 base pairs which were critical elements enabling the development of the first genome assembly programs for reconstruction of large regions of genomes (aka 'contigs').

Three years later, in 1998, the announcement by the newly-formed Celera Genomics that it would scale up the shotgun sequencing method to the human genome was greeted with skepticism
Skepticism

In ordinary usage, skepticism or scepticism refers to:* an attitude of doubt or a disposition to incredulity either in general or toward a particular object;...
 in some circles. The shotgun technique breaks the DNA
DNA

Deoxyribonucleic acid is a nucleic acid that contains the genetics instructions used in the development and functioning of all known living organisms and some viruses....
 into fragments of various sizes, ranging from 2,000 to 300,000 base pairs in length, forming what is called a DNA "library". Using an automated DNA sequencer
DNA sequencer

A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. It can be also considered an optical instrument as it generally analyses light signals originating from fluorochromes attached to nucleotides....
 the DNA is read in 800bp lengths from both ends of each fragment. Using a complex genome assembly algorithm and a supercomputer
Supercomputer

A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation. Supercomputers introduced in the 1960s were designed primarily by Seymour Cray at Control Data Corporation , and led the market into the 1970s until Cray left to form his own company, Cray Research....
, the pieces are combined and the genome can be reconstructed from the millions of short, 800 base pair fragments. The success of both the public and privately funded effort hinged upon a new, more highly automated capillary DNA sequencing
DNA sequencing

The term DNA sequencing refers to methods for determining the order of the nucleotide bases, adenine, guanine, cytosine, and thymine, in a molecule of DNA....
 machine, called the Applied Biosystems 3700, that ran the DNA sequences through an extremely fine capillary tube
Capillary action

Capillary action, capillarity, capillary motion, or wicking refers to two phenomena:# The movement of liquids in thin tubes...
 rather than a flat gel. Even more critical was the development of a new, larger-scale genome assembly program, which could handle the 30-50 million sequences that would be required to sequence the entire human genome with this method. At the time, such a program did not exist. One of the first major projects
Megaproject

A megaproject is an extremely large-scale investment project. Megaprojects are typically defined as costing more than United States dollar1 billion and attracting a lot of public attention because of substantial impacts on communities, Natural environment, and budgets....
 at Celera Genomics was the development of this assembler, which was written in parallel with the construction of a large, highly automated genome sequencing factory. Development of the assembler was led by Brian Ramos. The first version of this assembler was demonstrated in 2000, when the Celera team joined forces with Professor Gerald Rubin to sequence the fruit fly Drosophila melanogaster
Drosophila melanogaster

Drosophila melanogaster is a two-winged insect that belongs to the Diptera, the Order of the Fly. The species is commonly known as the Drosophilidae or vinegar fly, and is one of the most commonly used model organisms in biology, including studies in genetics, physiology and Life history theory....
 using the whole-genome shotgun method. At 130 million base pairs, it was at least 10 times larger than any genome previously shotgun assembled. One year later, the Celera team published their assembly of the three billion base pair human genome.

Methods

The IHGSC used pair-end sequencing plus whole-genome shotgun mapping of large (~100 Kbp) plasmid clones and shotgun sequencing of smaller plasmid sub-clones plus a variety of other mapping data to orient and check the assembly of each human chromosome.

The Celera group emphasized the importance of the “whole-genome shotgun” sequencing method, relying on sequence information to orient and locate their fragments within the chromosome. However they used the publicly available data from HGP to assist in the assembly and orientation process, raising concerns that the Celera sequence was not independently derived.

Genome donors


In the IHGSC international public-sector
Public sector

The public sector is the part of economic and administrative life that deals with the delivery of goods and services by and for the government, whether national, regional or local/municipal....
 Human Genome Project (HGP), researchers collected blood (female) or sperm (male) samples from a large number of donors. Only a few of many collected samples were processed as DNA resources. Thus the donor identities were protected so neither donors nor scientists could know whose DNA was sequenced. DNA clones from many different libraries were used in the overall project, with most of those libraries being created by Dr. Pieter J. de Jong. It has been informally reported, and is well known in the genomics community, that much of the DNA for the public HGP came from a single anonymous male donor from Buffalo, New York
Buffalo, New York

Buffalo , is the second largest city in the state of New York. Located in Western New York on the eastern shores of Lake Erie and at the head of the Niagara River, Buffalo is the principal city of the Buffalo-Niagara Falls metropolitan area and the county seat of Erie County, New York....
 (code name
Code name

A code name or cryptonym is a word or name used clandestinely to refer to another name or word. Code names are often used for military purposes, or in espionage....
 RP11).

HGP scientists used white blood cell
White blood cell

White blood cells , or leukocytes , are cell of the immune system defending the body against both infectious disease and foreign materials....
s from the blood of two male and two female donors (randomly selected from 20 of each) -- each donor yielding a separate DNA library. One of these libraries (RP11) was used considerably more than others, due to quality considerations. One minor technical issue is that male samples contain just over half as much DNA from the sex chromosomes (one X chromosome
X chromosome

The X chromosome is one of the two sex determination system chromosomes in many animal species, including mammals . It is a part of the XY sex-determination system and X0 sex-determination system....
 and one Y chromosome
Y chromosome

The Y chromosome is the Sex-determination system chromosome in most mammals, including humans. In mammals, it contains the gene SRY, which triggers testicle development, thus determining sex....
) compared to female samples (which contain two X chromosome
X chromosome

The X chromosome is one of the two sex determination system chromosomes in many animal species, including mammals . It is a part of the XY sex-determination system and X0 sex-determination system....
s). The other 22 chromosomes (the autosomes) are the same for both genders.

Although the main sequencing phase of the HGP has been completed, studies of DNA variation continue in the International HapMap Project
International HapMap Project

The International HapMap Project is an organization whose goal is to develop a haplotype map of the human genome , which will describe the common patterns of human genetic variability....
, whose goal is to identify patterns of single nucleotide polymorphism
Single nucleotide polymorphism

A single-nucleotide polymorphism is a DNA sequence variation occurring when a single nucleotide — adenine, thymine, cytosine, or guanine — in the genome differs between members of a species ....
 (SNP) groups (called haplotype
Haplotype

The term haplotype is a contraction of the term "Ploidy genotype." In genetics, a haplotype is a combination of alleles at multiple locus that are transmitted together on the same chromosome....
s, or “haps”). The DNA samples for the HapMap came from a total of 270 individuals: Yoruba people
Yoruba people

Yoruba people are one of the largest ethno-linguistic group or ethnic groups in west Africa. The majority of the Yoruba speak the Yoruba language ....
 in Ibadan
Ibadan

Ibadan , the Capital of Oyo State, is the third largest city in Nigeria by population , and the largest in geographical area. At independence, Ibadan was the largest and the most populous city in Nigeria and the third in Africa after Cairo and Johannesburg....
, Nigeria
Nigeria

Nigeria, officially the Federal Republic of Nigeria, is a federation constitutional republic comprising States of Nigeria and one Federal Capital Territory, Nigeria....
; Japanese people
Japanese people

The are the predominant ethnic group of Japan. Worldwide, approximately 130 million people are of Japanese descent; of these, approximately 127 million are residents of Japan....
 in Tokyo
Tokyo

, officially , is one of the 47 prefectures of Japan of Japan and located on the eastern side of the main island Honshu. The twenty-three special wards of Tokyo, each governed as a city, cover the area that was once the Tokyo City in the eastern part of the prefecture, and total over 8 million people....
; Han Chinese
Han Chinese

Han Chinese are an ethnic group native to China and, by most modern definitions, the largest single ethnic group in the Earth.Han Chinese constitute about 92 percent of the population of the People's Republic of China , 98 percent of the population of the Republic of China , 75 percent of the population of Singapore, and about 19 percent...
 in Beijing
Beijing

is a metropolis in northern China and the Capital of the People's Republic of China. It is one of the four municipality of China, which are equivalent to province in China's Political divisions of China....
; and the French Centre d’Etude du Polymorphisms Humain (CEf) resource, which consisted of residents of the United States having ancestry from Western and Northern Europe
Northern Europe

Northern Europe is the northern part or region of Europe. The United Nations defines Northern Europe as including the following countries and dependent regions:...
.

In the Celera Genomics
Celera Genomics

Celera Corporation was formerly a business unit of the Applera Corporation, but was spun off in July 2008 2008 to become an independent publicly traded company....
 private-sector
Private sector

In economics, the private sector is that part of the economy which is both run for private profit and is not controlled by the state. By contrast, enterprises that are part of the state are part of the public sector; private, non-profit organizations are regarded as part of the voluntary sector....
 project, DNA from five different individuals were used for sequencing. The lead scientist of Celera Genomics at that time, Craig Venter
Craig Venter

J. Craig Venter is an United States biologist and businessman. Venter founded The Institute for Genomic Research and has been inaccurately credited with being instrumental in mapping the human genome....
, later acknowledged (in a public letter to the journal Science
Science (journal)

Science is the academic journal of the American Association for the Advancement of Science and is considered one of the world's most prestigious scientific journals....
) that his DNA was one of 21 samples in the pool, five of which were selected for use.

On September 4, 2007, a team led by Craig Venter
Craig Venter

J. Craig Venter is an United States biologist and businessman. Venter founded The Institute for Genomic Research and has been inaccurately credited with being instrumental in mapping the human genome....
 published his complete DNA sequence, unveiling the six-billion-nucleotide genome of a single individual for the first time.

Benefits


The work on interpretation of genome data is still in its initial stages. It is anticipated that detailed knowledge of the human genome will provide new avenues for advances in medicine
Medicine

Medicine is the art and science of healing. It encompasses a range of health care practices evolved to maintain and restore health by the prevention and treatment of illness....
 and biotechnology
Biotechnology

Biotechnology is technology based on biology, especially when used in agriculture, food science, and medicine. United Nations Convention on Biological Diversity defines biotechnology as:...
. Clear practical results of the project emerged even before the work was finished. For example, a number of companies, such as Myriad Genetics
Myriad Genetics

Myriad Genetics is a biopharmaceutical company. It was co-founded in 1992 by Mark Skolnick and Nobel laureate Dr. Walter Gilbert. The company markets cancer predictive medicine products and develops therapeutics including drug candidates in the areas of Alzheimer's disease and cancer....
 started offering easy ways to administer genetic tests that can show predisposition to a variety of illnesses, including breast cancer
Breast cancer

Breast cancer is a cancer that starts in the Cell of the breast in women and men. Worldwide, breast cancer is the second most common type of cancer after lung cancer and the fifth most common cause of cancer death....
, disorders of hemostasis, cystic fibrosis
Cystic fibrosis

Cystic Fibrosis is a Genetic disorder affecting the exocrine glands of the lungs, liver, pancreas, and intestines, causing progressive disability due to multisystem failure....
, liver
Liver

The liver is a vital organ present in vertebrates and some other animals; it has a wide range of functions, a few of which are detoxification, protein synthesis, and production of biochemicals necessary for digestion....
 diseases and many others. Also, the etiologies
Etiology

Etiology is the study of Causality. The word is derived from the Ancient Greek , aitiologia, "giving a reason for" .The word is most commonly used in medical and philosophical theories, where it is used to refer to the study of why things occur, or even the reasons behind the way that things act, and is used in philosophy, physics, psy...
 for cancer
Cancer

Cancer is a class of diseases in which a group of cell display uncontrolled growth , invasion , and sometimes metastasis . These three malignant properties of cancers differentiate them from benign tumors, which are self-limited, do not invade or metastasize....
s, Alzheimer's disease
Alzheimer's disease

Alzheimer's disease , also called Alzheimer disease, Senile Dementia of the Alzheimer Type or simply Alzheimer's, is the most common form of dementia....
 and other areas of clinical interest are considered likely to benefit from genome information and possibly may lead in the long term to significant advances in their management.

There are also many tangible benefits for biological scientists. For example, a researcher investigating a certain form of cancer
Cancer

Cancer is a class of diseases in which a group of cell display uncontrolled growth , invasion , and sometimes metastasis . These three malignant properties of cancers differentiate them from benign tumors, which are self-limited, do not invade or metastasize....
 may have narrowed down his/her search to a particular gene. By visiting the human genome database on the world wide web
World Wide Web

The World Wide Web is a very large set of interlinked hypertext documents accessed via the Internet. With a Web browser, one can view Web pages that may contain writing, s, videos, and other multimedia and navigate between them using hyperlinks....
, this researcher can examine what other scientists have written about this gene, including (potentially) the three-dimensional structure of its product, its function(s), its evolutionary relationships to other human genes, or to genes in mice or yeast or fruit flies, possible detrimental mutations, interactions with other genes, body tissues in which this gene is activated, diseases associated with this gene or other datatypes.

Further, deeper understanding of the disease processes at the level of molecular biology may determine new therapeutic procedures. Given the established importance of DNA in molecular biology and its central role in determining the fundamental operation of cellular processes
Biological process

A biological process is a process of a living organism. Biological processes are made up of any number of chemical reactions or other events that results in a Chemical transformation....
, it is likely that expanded knowledge in this area will facilitate medical advances in numerous areas of clinical interest that may not have been possible without them.

The analysis of similarities between DNA sequences from different organisms is also opening new avenues in the study of evolution
Evolution

In biology, evolution is change in the heritability trait of a population of organisms from one generation to the next. These changes are caused by a combination of three main processes: variation, reproduction, and selection....
. In many cases, evolutionary questions can now be framed in terms of molecular biology
Molecular biology

Molecular biology is the study of biology at a molecule level. The field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry....
; indeed, many major evolutionary milestones (the emergence of the ribosome
Ribosome

Ribosomes are complexes of RNA and protein that are found in all cell s. Ribosomes from bacteria, archaea and eukaryotes, the three domains of life on Earth, have significantly different structure and RNA....
 and organelle
Organelle

In cell biology, an organelle is a specialized subunit within a cell that has a specific function, and is usually separately enclosed within its own lipid membrane....
s, the development of embryo
Embryo

An embryo is a multicellular organism ploidy eukaryote in its earliest stage of development, from the time of first cell division until birth, Egg , or germination....
s with body plans, the vertebrate
Vertebrate

Vertebrates are members of the subphylum Vertebrata, chordates with Vertebras or Vertebral columns. The grouping sometimes includes the hagfish, which have no vertebrae, but are genetically quite closely related to lampreys, which do have vertebrae....
 immune system
Immune system

An immune system is a collection of biological processes within an organism that protects against disease by identifying and killing pathogens and tumour cells....
) can be related to the molecular level. Many questions about the similarities and differences between humans and our closest relatives (the primate
Primate

A primate is a member of the biological order Primates , the group that contains lemurs, the Aye-aye, Lorisidaes, galagos, tarsiers, monkeys, and apes, with the last category including humans....
s, and indeed the other mammal
Mammal

Mammals are a class of vertebrate animals whose name is derived from their distinctive feature, mammary glands, with which they feed their young....
s) are expected to be illuminated by the data from this project.

The Human Genome Diversity Project
Human Genome Diversity Project

The Human Genome Diversity Project was started by Stanford University's Morrison Institute and a collaboration of scientists around the world....
 (HGDP), spinoff research aimed at mapping the DNA that varies between human ethnic group
Ethnic group

An ethnic group is a group of humans whose members identify with each other, through a common heritage that is real or presumed.Ethnic identity is further marked by the recognition from others of a group's distinctiveness and the recognition of common culture, linguistic, religion, human behaviour or Race traits, real or presumed, as indic...
s, which was rumored to have been halted, actually did continue and to date has yielded new conclusions. In the future, HGDP could possibly expose new data in disease surveillance
Clinical surveillance

Clinical surveillance refers to the surveillance of health data about a clinical syndrome that has a significant impact on public health, which is then used to drive decisions about health policy and health education....
, human development
Human development (biology)

Human development is the process of growing to maturity. In biological terms, this entails growth from a one-celled zygote to an adult human being....
 and anthropology
Anthropology

Anthropology is the study of humans and humanity in its totality. Anthropology has origins in the natural sciences, and the humanities. In Great Britain it was originally divided into physical anthropology and cultural anthropology, which itself was divided into archaeology, technology, ethnology and sociology ....
. HGDP could unlock secrets behind and create new strategies for managing the vulnerability of ethnic group
Ethnic group

An ethnic group is a group of humans whose members identify with each other, through a common heritage that is real or presumed.Ethnic identity is further marked by the recognition from others of a group's distinctiveness and the recognition of common culture, linguistic, religion, human behaviour or Race traits, real or presumed, as indic...
s to certain disease
Disease

A disease or medical condition is an abnormal condition of an organism that impairs bodily functions, associated with specific symptoms and Medical signs....
s (see race in biomedicine). It could also show how human population
Population

File:Population density.pngIn biology, a population is the collection of inter-breeding organisms of a particular species; in sociology, a collection of human beings....
s have adapted to these vulnerabilities.

Ethical, legal and social issues

The project's goals included not only identifying all of the approximately 24,000 genes in the human genome, but also to address the ethical, legal, and social issues (ELSI) that might arise from the availability of genetic information. Five percent of the annual budget was allocated to address the ELSI arising from the project.

Debra Harry, Executive Director of the U.S group Indigenous Peoples Council on Biocolonialism (IPCB), says that despite a decade of ELSI funding, the burden of genetics education has fallen on the tribes themselves to understand the motives of Human genome project and its potential impacts on their lives. Meanwhile, the government has been busily funding projects studying indigenous groups without any meaningful consultation with the groups. (See Biopiracy
Biopiracy

Biopiracy is a negative term for the appropriation, generally by means of patents, of legal rights over indigenous knowledge - particularly indigenous biomedical knowledge - without compensation to the indigenous groups who originally developed such knowledge....
.)

The main criticism of ELSI is the failure to address the conditions raised by population-based research, especially with regard to unique processes for group decision-making and cultural worldviews. Genetic variation research such as HGP is group population research, but most ethical guidelines, according to Harry, focus on individual rights instead of group rights. She says the research represents a clash of culture: indigenous people's life revolves around collectivity and group decision making whereas the Western culture promotes individuality. Harry suggests that one of the challenges of ethical research is to include respect for collective review and decision making, while also upholding the Western model of individual rights.

See also



External links

  • official information page


  • Uses data from the Human Genome Project to help make medicine personal
  • . NHGRI led the National Institutes of Health's (NIH's) contribution to the International Human Genome Project. This project, which had as its primary goal the sequencing of the three thousand million base pairs that make up human genome, was successfully completed in April 2003.
  • . Published from 1989 to 2002 by the US Department of Energy, this newsletter was a major communications method for coordination of the Human Genome Project. Complete online archives are available.
  • Project Gutenberg
    Project Gutenberg

    Project Gutenberg, abbreviated as PG, is a volunteer effort to digitize, archive and distribute cultural works, as founder Michael Hart said "To encourage the creation and distribution of eBooks."....
     hosts e-texts for Human Genome Project, titled Human Genome Project, Chromosome Number # (# denotes 01-22, X and Y). This information is raw sequence, released in November 2002; access to entry pages with download links is available through http://www.gutenberg.org/etext/3501 for Chromosome 1 sequentially to http://www.gutenberg.org/etext/3524 for the Y Chromosome. Note that this sequence might not be considered definitive due to ongoing revisions and refinements. In addition to the chromosome files, there is a dated March 2004 which contains additional sequence information.
  • Department of Energy's portal to the international Human Genome Project, Microbial Genome Program, and Genomics:GTL systems biology for energy and environment
  • has general and detailed primers on DNA, genes and genomes, the Human Genome Project and science spotlights.
  • , an automated annotation system and browser for the human genome
  • , This site contains the reference sequence and working draft assemblies for a large collection of genomes. It also provides a portal to the ENCODE project.
  • , including the HGP's paper on the draft genome sequence
  • "Your Genes, your health, your future".
  • Venter discusses Celera's progress in deciphering the human genome sequence and its relationship to healthcare and to the federally funded Human Genome Project.
  • Companion website to 2-hour NOVA program documenting the race to decode the genome, including the entire program hosted in 16 parts in either QuickTime
    QuickTime

    QuickTime is a multimedia framework developed by Apple Inc., capable of handling various formats of digital video, media clips, sound, text, animation, music, and QuickTime VRs....
     or RealPlayer
    RealPlayer

    RealPlayer is a Proprietary software cross-platform media player by RealNetworks that plays a number of multimedia formats including MP3, MPEG-4, QuickTime, Windows Media, and multiple versions of Proprietary format RealAudio and RealVideo formats....
     format.
  • Article by Leota Lone Dog, author of the 1999 article "whose genes are they" in the Journal of health and social policy, 10.4: 51-66.