Genome

Genome

Discussion
Ask a question about 'Genome'
Start a new discussion about 'Genome'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia
In modern molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...

 and genetics
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....

, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 or, for many types of virus
RNA virus
An RNA virus is a virus that has RNA as its genetic material. This nucleic acid is usually single-stranded RNA but may be double-stranded RNA...

, in RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

. The genome includes both the gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s and the non-coding sequences of the DNA/RNA.

Origin of term


The term was adapted in 1920 by Hans Winkler
Hans Winkler
Professor Hans Winkler was a German botanist. He was Professor of Botany at the University of Hamburg, and a director of that university's Institute of Botany. He is remembered for coining the term 'genome' in 1920, by making a portmanteau of the words gene and chromosome...

, Professor of Botany
Botany
Botany, plant science, or plant biology is a branch of biology that involves the scientific study of plant life. Traditionally, botany also included the study of fungi, algae and viruses...

 at the University of Hamburg
University of Hamburg
The University of Hamburg is a university in Hamburg, Germany. It was founded on 28 March 1919 by Wilhelm Stern and others. It grew out of the previous Allgemeines Vorlesungswesen and the Kolonialinstitut as well as the Akademisches Gymnasium. There are around 38,000 students as of the start of...

, Germany
Germany
Germany , officially the Federal Republic of Germany , is a federal parliamentary republic in Europe. The country consists of 16 states while the capital and largest city is Berlin. Germany covers an area of 357,021 km2 and has a largely temperate seasonal climate...

. In Greek
Greek language
Greek is an independent branch of the Indo-European family of languages. Native to the southern Balkans, it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. Its writing system has been the Greek alphabet for the majority of its history;...

, the word genome (γίνομαι) means "I become, I am born, to come into being". The Oxford English Dictionary suggests the name to be a blend of the words gene and chromosome. A few related -ome words already existed, such as biome
Biome
Biomes are climatically and geographically defined as similar climatic conditions on the Earth, such as communities of plants, animals, and soil organisms, and are often referred to as ecosystems. Some parts of the earth have more or less the same kind of abiotic and biotic factors spread over a...

and rhizome
Rhizome
In botany and dendrology, a rhizome is a characteristically horizontal stem of a plant that is usually found underground, often sending out roots and shoots from its nodes...

, forming a vocabulary into which genome fits systematically.

Overview


Some organisms have multiple copies of chromosomes, diploid, triploid, tetraploid and so on. In classical genetics, in a sexually reproducing organism
Organism
In biology, an organism is any contiguous living system . In at least some form, all organisms are capable of response to stimuli, reproduction, growth and development, and maintenance of homoeostasis as a stable whole.An organism may either be unicellular or, as in the case of humans, comprise...

 (typically eukarya) the gamete
Gamete
A gamete is a cell that fuses with another cell during fertilization in organisms that reproduce sexually...

 has half the number of chromosomes of the somatic cell
Somatic cell
A somatic cell is any biological cell forming the body of an organism; that is, in a multicellular organism, any cell other than a gamete, germ cell, gametocyte or undifferentiated stem cell...

 and the genome is a full set of chromosome
Chromosome
A chromosome is an organized structure of DNA and protein found in cells. It is a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide sequences. Chromosomes also contain DNA-bound proteins, which serve to package the DNA and control its functions.Chromosomes...

s in a gamete. In haploid organism
Organism
In biology, an organism is any contiguous living system . In at least some form, all organisms are capable of response to stimuli, reproduction, growth and development, and maintenance of homoeostasis as a stable whole.An organism may either be unicellular or, as in the case of humans, comprise...

s, including cells of bacteria
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...

, archaea
Archaea
The Archaea are a group of single-celled microorganisms. A single individual or species from this domain is called an archaeon...

, and in organelles including mitochondria and chloroplasts, or virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...

es, that similarly contain genes, the single or set of circular and/or linear chains of DNA (or RNA for some viruses
RNA virus
An RNA virus is a virus that has RNA as its genetic material. This nucleic acid is usually single-stranded RNA but may be double-stranded RNA...

), likewise constitute the genome. The term genome can be applied specifically to mean that stored on a complete set of nuclear
Cell nucleus
In cell biology, the nucleus is a membrane-enclosed organelle found in eukaryotic cells. It contains most of the cell's genetic material, organized as multiple long linear DNA molecules in complex with a large variety of proteins, such as histones, to form chromosomes. The genes within these...

 DNA
(i.e., the "nuclear genome") but can also be applied to that stored within organelles that contain their own DNA, as with the "mitochondrial genome" or the "chloroplast
Chloroplast
Chloroplasts are organelles found in plant cells and other eukaryotic organisms that conduct photosynthesis. Chloroplasts capture light energy to conserve free energy in the form of ATP and reduce NADP to NADPH through a complex set of processes called photosynthesis.Chloroplasts are green...

 genome". Additionally, the genome can comprise nonchromosomal genetic elements such as virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...

es, plasmid
Plasmid
In microbiology and genetics, a plasmid is a DNA molecule that is separate from, and can replicate independently of, the chromosomal DNA. They are double-stranded and, in many cases, circular...

s, and transposable elements.

When people say that the genome of a sexually reproducing
Sexual reproduction
Sexual reproduction is the creation of a new organism by combining the genetic material of two organisms. There are two main processes during sexual reproduction; they are: meiosis, involving the halving of the number of chromosomes; and fertilization, involving the fusion of two gametes and the...

 species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...

 has been "sequenced
Sequencing
In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer...

", typically they are referring to a determination of the sequences of one set of autosome
Autosome
An autosome is a chromosome that is not a sex chromosome, or allosome; that is to say, there is an equal number of copies of the chromosome in males and females. For example, in humans, there are 22 pairs of autosomes. In addition to autosomes, there are sex chromosomes, to be specific: X and Y...

s and one of each type of sex chromosome, which together represent both of the possible sexes. Even in species that exist in only one sex, what is described as "a genome sequence" may be a composite read from the chromosomes of various individuals. In general use, the phrase "genetic makeup" is sometimes used conversationally to mean the genome of a particular individual or organism. The study of the global properties of genomes of related organisms is usually referred to as genomics
Genomics
Genomics is a discipline in genetics concerning the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis,...

, which distinguishes it from genetics
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....

 which generally studies the properties of single gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s or groups of genes.

Both the number of base pair
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...

s and the number of genes vary widely from one species to another, and there is only a rough correlation between the two (an observation known as the C-value paradox). At present, the highest known number of genes is around 60,000, for the protozoan causing trichomoniasis
Trichomoniasis
Trichomoniasis, sometimes referred to as "trich", is a common cause of vaginitis. It is a sexually transmitted disease, and is caused by the single-celled protozoan parasite Trichomonas vaginalis producing mechanical stress on host cells and then ingesting cell fragments after cell death...

 (see List of sequenced eukaryotic genomes), almost three times as many as in the human genome
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...

.

An analogy to the human genome stored on DNA is that of instructions stored in a book:
  • The book (genome) would contain 23 chapters (chromosomes);
  • each chapter contains 48 to 250 million letters (A,C,G,T) without spaces;
  • Hence, the book contains over 3.2 billion letters total;
  • The book fits into a cell nucleus the size of a pinpoint;
  • At least one copy of the book (all 23 chapters) is contained in most cells of our body. The only exception in humans is found in mature red blood cells which become enucleated during development and therefore lack a genome.

Types


Most biological entities that are more complex than a virus sometimes or always carry additional genetic material besides that which resides in their chromosomes. In some contexts, such as sequencing the genome of a pathogenic microbe, "genome" is meant to include information stored on this auxiliary material, which is carried in plasmid
Plasmid
In microbiology and genetics, a plasmid is a DNA molecule that is separate from, and can replicate independently of, the chromosomal DNA. They are double-stranded and, in many cases, circular...

s. In such circumstances then, "genome" describes all of the genes and information on non-coding DNA that have the potential to be present.

In eukaryote
Eukaryote
A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Eukaryotes may more formally be referred to as the taxon Eukarya or Eukaryota. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear...

s such as plants, protozoa and animals, however, "genome" carries the typical connotation of only information on chromosomal DNA. So although these organisms contain chloroplasts and/or mitochondria that have their own DNA, the genetic information contained by DNA within these organelles is not considered part of the genome. In fact, mitochondria are sometimes said to have their own genome often referred to as the "mitochondrial genome". The DNA found within the chloroplast may be referred to as the "plastome
Plastome
The plastome is the genetic material that is found in plastids in plant cells . It composes part of the entire genome of photosynthetic organisms....

".

Genomes and genetic variation


A genome does not capture the genetic diversity or the genetic polymorphism
Polymorphism (biology)
Polymorphism in biology occurs when two or more clearly different phenotypes exist in the same population of a species — in other words, the occurrence of more than one form or morph...

 of a species. For example, the human genome sequence in principle could be determined from just half the information on the DNA of one cell from one individual. To learn what variations in genetic information underlie particular traits or diseases requires comparisons across individuals. This point explains the common usage of "genome" (which parallels a common usage of "gene") to refer not to the information in any particular DNA sequence, but to a whole family of sequences that share a biological context.

Although this concept may seem counter intuitive, it is the same concept that says there is no particular shape that is the shape of a cheetah
Cheetah
The cheetah is a large-sized feline inhabiting most of Africa and parts of the Middle East. The cheetah is the only extant member of the genus Acinonyx, most notable for modifications in the species' paws...

. Cheetahs vary, and so do the sequences of their genomes. Yet both the individual animals and their sequences share commonalities, so one can learn something about cheetahs and "cheetah-ness" from a single example of either.

Sequencing and mapping


The Human Genome Project
Human Genome Project
The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...

 was organized to map and to sequence
Sequencing
In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer...

 the human genome. Other genome projects include mouse, rice
Rice
Rice is the seed of the monocot plants Oryza sativa or Oryza glaberrima . As a cereal grain, it is the most important staple food for a large part of the world's human population, especially in East Asia, Southeast Asia, South Asia, the Middle East, and the West Indies...

, the plant Arabidopsis thaliana
Arabidopsis thaliana
Arabidopsis thaliana is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, arabidopsis is popular as a model organism in plant biology and genetics...

, the puffer fish, and bacteria like E. coli. In 1976, Walter Fiers
Walter Fiers
Walter Fiers is a Belgian molecular biologist.He obtained a degree of Engineer for Chemistry and Agricultural Industries at the University of Ghent in 1954, and started his research career as an enzymologist in the laboratory of Laurent Vandendriessche in Ghent. In 1956-57, he worked with Heinz...

 at the University of Ghent (Belgium
Belgium
Belgium , officially the Kingdom of Belgium, is a federal state in Western Europe. It is a founding member of the European Union and hosts the EU's headquarters, and those of several other major international organisations such as NATO.Belgium is also a member of, or affiliated to, many...

) was the first to establish the complete nucleotide sequence of a viral RNA-genome (bacteriophage MS2
Bacteriophage
A bacteriophage is any one of a number of viruses that infect bacteria. They do this by injecting genetic material, which they carry enclosed in an outer protein capsid...

). The first DNA-genome project to be completed was the Phage Φ-X174
Phi-X174 phage
The phi X 174 bacteriophage was the first DNA-based genome to be sequenced. This work was completed by Fred Sanger and his team in 1977. In 1962, Walter Fiers had already demonstrated the physical, covalently closed circularity of phi X 174 DNA.In 2003, it was reported that the whole genome of...

, with only 5386 base pairs, which was sequenced by Fred Sanger in 1977. The first bacterial genome to be completed was that of Haemophilus influenzae
Haemophilus influenzae
Haemophilus influenzae, formerly called Pfeiffer's bacillus or Bacillus influenzae, Gram-negative, rod-shaped bacterium first described in 1892 by Richard Pfeiffer during an influenza pandemic. A member of the Pasteurellaceae family, it is generally aerobic, but can grow as a facultative anaerobe. H...

, completed by a team at The Institute for Genomic Research
The Institute for Genomic Research
The Institute for Genomic Research was a non-profit genomics research institute founded in 1992 by Craig Venter in Rockville, Maryland, United States. It is now a part of the J. Craig Venter Institute.-History:...

 in 1995. A few months later, the first eukaryotic genome was completed, with the 16 chromosomes of budding yeast Saccharomyces cerevisiae
Saccharomyces cerevisiae
Saccharomyces cerevisiae is a species of yeast. It is perhaps the most useful yeast, having been instrumental to baking and brewing since ancient times. It is believed that it was originally isolated from the skin of grapes...

 being released as a result of a European-led effort begun in the mid-1980s.

The development of new technologies has made it dramatically easier and cheaper to do sequencing, and the number of complete genome sequences is growing rapidly. Among many genome databases, the one maintained by the US National Institutes of Health is inclusive.

These new technologies open up the prospect of personal genome sequencing as an important diagnostic tool. A major step toward that goal was the completion of the decipherment of the full genome
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...

 of DNA pioneer James D. Watson
James D. Watson
James Dewey Watson is an American molecular biologist, geneticist, and zoologist, best known as one of the co-discoverers of the structure of DNA in 1953 with Francis Crick...

 in 2007.

Whereas a genome sequence lists the order of every DNA base in a genome, a genome map identifies the landmarks. A genome map is less detailed than a genome sequence and aids in navigating around the genome. A fundamental step in the Human genome project was the release of a detailed genomic map by Jean Weissenbach
Jean Weissenbach
Jean Weissenbach is the current director of the Genoscope. He is one of the pioneers of the sequencing and analysis of the genomes.-References:...

 and his team at the Genoscope in Paris .

Comparison of different genome sizes



Organism type Organism Genome size (base pair
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...

s)
Genome size (in human-readable format) mass - in pg Note
Virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...

Bacteriophage MS2
Bacteriophage MS2
The bacteriophage MS2 is an icosahedral, positive-sense single-stranded RNA virus that infects the bacterium Escherichia coli.-History:...

3,569 3.5kb 0.000002 First sequenced RNA-genome
Virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...

SV40
SV40
SV40 is an abbreviation for Simian vacuolating virus 40 or Simian virus 40, a polyomavirus that is found in both monkeys and humans...

5,224 5.2kb
Virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...

Phage Φ-X174
Phi-X174 phage
The phi X 174 bacteriophage was the first DNA-based genome to be sequenced. This work was completed by Fred Sanger and his team in 1977. In 1962, Walter Fiers had already demonstrated the physical, covalently closed circularity of phi X 174 DNA.In 2003, it was reported that the whole genome of...

5,386 5.4kb First sequenced DNA-genome
Virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...

HIV
HIV
Human immunodeficiency virus is a lentivirus that causes acquired immunodeficiency syndrome , a condition in humans in which progressive failure of the immune system allows life-threatening opportunistic infections and cancers to thrive...

9,749 9.7kb
Virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...

Phage λ
Lambda phage
Enterobacteria phage λ is a temperate bacteriophage that infects Escherichia coli.Lambda phage is a virus particle consisting of a head, containing double-stranded linear DNA as its genetic material, and a tail that can have tail fibers. The phage particle recognizes and binds to its host, E...

48,502 48kb
Virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...

Mimivirus
Mimivirus
Mimivirus is a viral genus containing a single identified species named Acanthamoeba polyphaga mimivirus , or is a group of phylogenetically related large viruses . In colloquial speech, APMV is more commonly referred to as just “mimivirus”...

1,181,404 1.2Mb Largest known viral genome
Bacterium Haemophilus influenzae
Haemophilus influenzae
Haemophilus influenzae, formerly called Pfeiffer's bacillus or Bacillus influenzae, Gram-negative, rod-shaped bacterium first described in 1892 by Richard Pfeiffer during an influenza pandemic. A member of the Pasteurellaceae family, it is generally aerobic, but can grow as a facultative anaerobe. H...

1,830,000 1.8Mb First genome of a living organism sequenced, July 1995
Bacterium Carsonella ruddii 159,662 160kb Smallest non-viral genome.
Bacterium Buchnera aphidicola 600,000 600kb
Bacterium Wigglesworthia glossinidia 700,000 700Kb
Bacterium Escherichia coli
Escherichia coli
Escherichia coli is a Gram-negative, rod-shaped bacterium that is commonly found in the lower intestine of warm-blooded organisms . Most E. coli strains are harmless, but some serotypes can cause serious food poisoning in humans, and are occasionally responsible for product recalls...

4,600,000 4.6Mb
Bacterium Solibacter usitatus (strain Ellin 6076) 9,970,000 10Mb Largest known Bacterial genome
Amoeboid
Amoeboid
Amoeboids are single-celled life-forms characterized by an irregular shape."Amoeboid" and "amœba" are often used interchangeably even by biologists, and especially refer to a creature moving by using pseudopodia. Most references to "amoebas" or "amoebae" are to amoeboids in general rather than to...

Polychaos dubium ("Amoeba" dubia) 670,000,000,000 670Gb 737 Largest known genome. (Disputed )
Plant
Plant
Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or...

Arabidopsis thaliana
Arabidopsis thaliana
Arabidopsis thaliana is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, arabidopsis is popular as a model organism in plant biology and genetics...

157,000,000 157Mb First plant genome sequenced, December 2000.
Plant
Plant
Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or...

Genlisea margaretae
Genlisea margaretae
Genlisea margaretae is a carnivorous species in the genus Genlisea native to areas of Madagascar, Tanzania, and Zambia. It has pale bundles of root-like organs up to about 20 cm long under ground that attract, trap, and digest protozoans. These organs are subterranean leaves, which lack chlorophyll...

63,400,000 63Mb Smallest recorded flowering plant
Flowering plant
The flowering plants , also known as Angiospermae or Magnoliophyta, are the most diverse group of land plants. Angiosperms are seed-producing plants like the gymnosperms and can be distinguished from the gymnosperms by a series of synapomorphies...

 genome, 2006.
Plant
Plant
Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or...

Fritillaria assyrica 130,000,000,000 130Gb
Plant
Plant
Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or...

Populus trichocarpa
Poplar
Populus is a genus of 25–35 species of deciduous flowering plants in the family Salicaceae, native to most of the Northern Hemisphere. English names variously applied to different species include poplar , aspen, and cottonwood....

480,000,000 480Mb First tree genome sequenced, September 2006
Plant
Plant
Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or...

Paris japonica
Paris japonica
is a species of the genus Paris in the family Melanthiaceae, which has the largest genome of any plant yet assayed, about 150 billion base pairs long. An octoploid and suspected allopolyploid hybrid of four species, it has 40 chromosomes. It is native to sub-alpine regions of...

(Japanese-native, pale-petal)
150,000,000,000 150Gb 152.23 Largest plant genome known
Moss
Moss
Mosses are small, soft plants that are typically 1–10 cm tall, though some species are much larger. They commonly grow close together in clumps or mats in damp or shady locations. They do not have flowers or seeds, and their simple leaves cover the thin wiry stems...

Physcomitrella patens
Physcomitrella patens
Physcomitrella patens is a moss used as a model organism for studies on plant evolution, development and physiology.-Model organism:...

480,000,000 480Mb First genome of a bryophyte
Bryophyte
Bryophyte is a traditional name used to refer to all embryophytes that do not have true vascular tissue and are therefore called 'non-vascular plants'. Some bryophytes do have specialized tissues for the transport of water; however since these do not contain lignin, they are not considered to be...

 sequenced, January 2008.
Yeast
Yeast
Yeasts are eukaryotic micro-organisms classified in the kingdom Fungi, with 1,500 species currently described estimated to be only 1% of all fungal species. Most reproduce asexually by mitosis, and many do so by an asymmetric division process called budding...

Saccharomyces cerevisiae
Saccharomyces cerevisiae
Saccharomyces cerevisiae is a species of yeast. It is perhaps the most useful yeast, having been instrumental to baking and brewing since ancient times. It is believed that it was originally isolated from the skin of grapes...

12,100,000 12.1Mb First eukaryotic genome sequenced, 1996
Fungus
Fungus
A fungus is a member of a large group of eukaryotic organisms that includes microorganisms such as yeasts and molds , as well as the more familiar mushrooms. These organisms are classified as a kingdom, Fungi, which is separate from plants, animals, and bacteria...

Aspergillus nidulans
Aspergillus nidulans
Aspergillus nidulans is one of many species of filamentous fungi in the phylum Ascomycota...

30,000,000 30Mb
Nematode Caenorhabditis elegans
Caenorhabditis elegans
Caenorhabditis elegans is a free-living, transparent nematode , about 1 mm in length, which lives in temperate soil environments. Research into the molecular and developmental biology of C. elegans was begun in 1974 by Sydney Brenner and it has since been used extensively as a model...

100,300,000 100Mb First multicellular animal genome sequenced, December 1998
Nematode Pratylenchus coffeae
Pratylenchus coffeae
Pratylenchus coffeae is a plant pathogenic nematode.- External links :*...

20,000,000 20Mb Smallest animal genome known
Insect
Insect
Insects are a class of living creatures within the arthropods that have a chitinous exoskeleton, a three-part body , three pairs of jointed legs, compound eyes, and two antennae...

Drosophila melanogaster
Drosophila melanogaster
Drosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is known generally as the common fruit fly or vinegar fly. Starting from Charles W...

(fruit fly)
130,000,000 130Mb
Insect
Insect
Insects are a class of living creatures within the arthropods that have a chitinous exoskeleton, a three-part body , three pairs of jointed legs, compound eyes, and two antennae...

Bombyx mori
Bombyx mori
The silkworm is the larva or caterpillar of the domesticated silkmoth, Bombyx mori . It is an economically important insect, being a primary producer of silk...

(silk moth)
530,000,000 530Mb
Insect
Insect
Insects are a class of living creatures within the arthropods that have a chitinous exoskeleton, a three-part body , three pairs of jointed legs, compound eyes, and two antennae...

Apis mellifera (honey bee) 236,000,000 236Mb
Insect
Insect
Insects are a class of living creatures within the arthropods that have a chitinous exoskeleton, a three-part body , three pairs of jointed legs, compound eyes, and two antennae...

Solenopsis invicta (fire ant) 480,000,000 480Mb
Fish
Fish
Fish are a paraphyletic group of organisms that consist of all gill-bearing aquatic vertebrate animals that lack limbs with digits. Included in this definition are the living hagfish, lampreys, and cartilaginous and bony fish, as well as various extinct related groups...

Tetraodon nigroviridis
Tetraodon nigroviridis
Tetraodon nigroviridis is one of the pufferfish known as the green spotted puffer. It is found across South and Southeast Asia in coastal freshwater and brackish water habitats. Tetraodon nigroviridis reaches a maximum length of about 15 cm...

(type of puffer fish)
385,000,000 390Mb Smallest vertebrate genome known
Mammal
Mammal
Mammals are members of a class of air-breathing vertebrate animals characterised by the possession of endothermy, hair, three middle ear bones, and mammary glands functional in mothers with young...

Homo sapiens 3,200,000,000 3.2Gb 3
Fish
Fish
Fish are a paraphyletic group of organisms that consist of all gill-bearing aquatic vertebrate animals that lack limbs with digits. Included in this definition are the living hagfish, lampreys, and cartilaginous and bony fish, as well as various extinct related groups...

Protopterus aethiopicus (marbled lungfish) 130,000,000,000 130Gb 143 Largest vertebrate genome known


Note: The DNA from a single (diploid) human cell if the 46 chromosomes were connected end-to-end and straightened, would have a length of ~2 m and a width of ~2.4 nanometers.

Since genomes and their organisms are very complex, one research strategy is to reduce the number of genes in a genome to the bare minimum and still have the organism in question survive. There is experimental work being done on minimal genomes for single cell organisms as well as minimal genomes for multicellular organisms (see Developmental biology
Developmental biology
Developmental biology is the study of the process by which organisms grow and develop. Modern developmental biology studies the genetic control of cell growth, differentiation and "morphogenesis", which is the process that gives rise to tissues, organs and anatomy.- Related fields of study...

). The work is both in vivo
In vivo
In vivo is experimentation using a whole, living organism as opposed to a partial or dead organism, or an in vitro controlled environment. Animal testing and clinical trials are two forms of in vivo research...

and in silico
In silico
In silico is an expression used to mean "performed on computer or via computer simulation." The phrase was coined in 1989 as an analogy to the Latin phrases in vivo and in vitro which are commonly used in biology and refer to experiments done in living organisms and outside of living organisms,...

.

Genome evolution


Genomes are more than the sum of an organism's genes and have traits that may be measured
Measurement
Measurement is the process or the result of determining the ratio of a physical quantity, such as a length, time, temperature etc., to a unit of measurement, such as the metre, second or degree Celsius...

 and studied without reference to the details of any particular genes and their products. Researchers compare traits such as chromosome number (karyotype
Karyotype
A karyotype is the number and appearance of chromosomes in the nucleus of an eukaryotic cell. The term is also used for the complete set of chromosomes in a species, or an individual organism.p28...

), genome size
Genome size
Genome size is the total amount of DNA contained within one copy of a single genome. It is typically measured in terms of mass in picograms or less frequently in Daltons or as the total number of nucleotide base pairs typically in megabases . One picogram equals 978 megabases...

, gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

 order, codon usage bias
Codon usage bias
Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides that encodes a specific amino acid residue in a polypeptide chain or for the termination of translation .There are 64 different codons but only 20...

, and GC-content
GC-content
In molecular biology and genetics, GC-content is the percentage of nitrogenous bases on a DNA molecule that are either guanine or cytosine . This may refer to a specific fragment of DNA or RNA, or that of the whole genome...

 to determine what mechanisms could have produced the great variety of genomes that exist today (for recent overviews, see Brown 2002; Saccone and Pesole 2003; Benfey and Protopapas 2004; Gibson and Muse 2004; Reese 2004; Gregory 2005).

Duplications
Gene duplication
Gene duplication is any duplication of a region of DNA that contains a gene; it may occur as an error in homologous recombination, a retrotransposition event, or duplication of an entire chromosome.The second copy of the gene is often free from selective pressure — that is, mutations of it have no...

 play a major role in shaping the genome. Duplications may range from extension of short tandem repeats, to duplication of a cluster of genes, and all the way to duplications of entire chromosomes or even entire genomes
Polyploidy
Polyploid is a term used to describe cells and organisms containing more than two paired sets of chromosomes. Most eukaryotic species are diploid, meaning they have two sets of chromosomes — one set inherited from each parent. However polyploidy is found in some organisms and is especially common...

. Such duplications are probably fundamental to the creation of genetic novelty.

Horizontal gene transfer
Horizontal gene transfer
Horizontal gene transfer , also lateral gene transfer , is any process in which an organism incorporates genetic material from another organism without being the offspring of that organism...

 is invoked to explain how there is often extreme similarity between small portions of the genomes of two organisms that are otherwise very distantly related. Horizontal gene transfer seems to be common among many microbes. Also, eukaryotic cells
Eukaryote
A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Eukaryotes may more formally be referred to as the taxon Eukarya or Eukaryota. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear...

 seem to have experienced a transfer of some genetic material from their chloroplast
Chloroplast
Chloroplasts are organelles found in plant cells and other eukaryotic organisms that conduct photosynthesis. Chloroplasts capture light energy to conserve free energy in the form of ATP and reduce NADP to NADPH through a complex set of processes called photosynthesis.Chloroplasts are green...

 and mitochondrial genomes to their nuclear chromosomes.

External links