Molecular Inversion Probe
Encyclopedia
Molecular Inversion Probe (MIP)
belongs to the class of Capture by Circularization molecular
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...

 techniques for performing genomic partitioning, a process through which one captures and enriches specific regions of the genome. Probes used in this technique are single stranded DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 molecules and, similar to other genomic partitioning techniques, contain sequences that are complementary
Complementary sequences
In applied mathematics, complementary sequences are pairs of sequences with the useful property that their out-of-phase aperiodic autocorrelation coefficients sum to zero. Binary complementary sequences were first introduced by Marcel J. E. Golay in 1949...

 to the target in the genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

; these probes hybridize to and capture the genomic target. MIP stands unique from other genomic partitioning strategies in that MIP probes share the common design of two genomic target complementary
Complementary sequences
In applied mathematics, complementary sequences are pairs of sequences with the useful property that their out-of-phase aperiodic autocorrelation coefficients sum to zero. Binary complementary sequences were first introduced by Marcel J. E. Golay in 1949...

 segments separated by a linker region. With this design, when the probe hybridizes to the target, it undergoes an inversion in configuration (as suggested by the name of the technique) and circularizes. Specifically, the two target complementary regions at the 5’ and 3’ ends of the probe become adjacent to one another while the internal linker region forms a free hanging loop. The technology has been used extensively in the HapMap project for large-scale SNP genotyping
SNP genotyping
SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation...

 as well as for studying gene copy alterations
and characteristics of specific genomic loci to identify biomarkers for different diseases such as cancer. Key strengths of the MIP technology include its high specificity
Sensitivity and specificity
Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function. Sensitivity measures the proportion of actual positives which are correctly identified as such Sensitivity and specificity are statistical...

 to the target and its scalability for high-throughput, multiplexed
Multiplex (assay)
A multiplex assay is a type of laboratory procedure that simultaneously measures multiple analytes in a single assay. It is distinguished from procedures that measure one or a few analytes at a time...

 analyses where tens of thousands of genomic loci
Locus (genetics)
In the fields of genetics and genetic computation, a locus is the specific location of a gene or DNA sequence on a chromosome. A variant of the DNA sequence at a given locus is called an allele. The ordered list of loci known for a particular genome is called a genetic map...

 are assayed simultaneously.

Technique Procedure

Molecular Inversion Probe Structure

The probes are designed with sequences that are complementary to the genomic target at its 5’ and 3’ ends
.
The internal region contains two universal PCR primer sites that are common to all MIPs as well as a probe-release site , which is usually a restriction site . If the identification of the captured genomic target is performed using array
DNA microarray
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

-based hybridization approaches, the internal region may optionally contain a probe-specific tag sequence that uniquely identifies the given probe as well as a tag-release site, which, similar to the probe-release site, is also a restriction site.

Protocol

  • Anneal probe to genomic target DNA

Probes are added to the genomic DNA sample. After a denaturation followed by an annealing step, the target-complementary ends of the probe are hybridized to the target DNA. The probes then undergo circularization in this process. These probes, however, are designed such that a gap delimited by the hybridized ends of the probes remains over the target region. The size of the gap ranges from a single nucleotide for SNP genotyping
Genotyping
Genotyping is the process of determining differences in the genetic make-up of an individual by examining the individual's DNA sequence using biological assays and comparing it to another individual's sequence or a reference sequence. It reveals the alleles an individual has inherited from their...


to several hundred nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

s for loci
Locus (genetics)
In the fields of genetics and genetic computation, a locus is the specific location of a gene or DNA sequence on a chromosome. A variant of the DNA sequence at a given locus is called an allele. The ordered list of loci known for a particular genome is called a genetic map...

 capture (e.g. exome
Exome
The exome is the part of the genome formed by exons, coding portions of genes in the genome that are expressed, that is, provide the genetic blueprint used in the synthesis of proteins and other functional gene products...

 capture) .
  • Gap filling

The gap is filled by DNA polymerase
DNA polymerase
A DNA polymerase is an enzyme that helps catalyze in the polymerization of deoxyribonucleotides into a DNA strand. DNA polymerases are best known for their feedback role in DNA replication, in which the polymerase "reads" an intact DNA strand as a template and uses it to synthesize the new strand....

 using free nucleotides and the ends of the probe are ligated by ligase
Ligase
In biochemistry, ligase is an enzyme that can catalyse the joining of two large molecules by forming a new chemical bond, usually with accompanying hydrolysis of a small chemical group dependent to one of the larger molecules...

, resulting in a fully circularized probe.
  • Remove non-reacted probes

Since gap filling is not performed for non-reacted probes, they remain linear. Exonuclease
Exonuclease
Exonucleases are enzymes that work by cleaving nucleotides one at a time from the end of a polynucleotide chain. A hydrolyzing reaction that breaks phosphodiester bonds at either the 3’ or the 5’ end occurs. Its close relative is the endonuclease, which cleaves phosphodiester bonds in the middle ...

 treatment removes these non-reacted probes as well as any remaining linear DNA in the reaction.
  • Probe release

In some versions of the protocol, the probe-release site (commonly a restriction site) is cleaved by restriction enzyme
Restriction enzyme
A Restriction Enzyme is an enzyme that cuts double-stranded DNA at specific recognition nucleotide sequences known as restriction sites. Such enzymes, found in bacteria and archaea, are thought to have evolved to provide a defense mechanism against invading viruses...

s such that the probe becomes linearized. In this linearized probe the universal PCR primer sequences are located at the 5’ and 3’ ends and the captured genomic target becomes part of the internal segment of the probe. Other protocols leave the probe as a circularized molecule.
  • Captured target enrichment

If the probe is linearized, traditional PCR amplification is performed to enrich the captured target using the universal primers of the probe. Otherwise, rolling circle amplification
Rolling circle replication
Rolling circle replication describes a process of unidirectional nucleic acid replication that can rapidly synthesize multiple copies of circular molecules of DNA or RNA, such as plasmids, the genomes of bacteriophages, and the circular RNA genome of viroids...

 is performed for the circular probe.
  • Captured target identification

The captured target can be identified either via array-based
DNA microarray
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

 hybridization approaches or by sequencing of the target . If array-based approach is used, the probe may optionally contain a probe-specific tag that uniquely identifies the probe as well as the genomic region targeted by it. The tags from each probe are released by cleaving the tag release site with restriction enzymes. These tags are then hybridized to the sequences that are placed on the array and are complementary to them. The captured target can also be identified by sequencing the probe, now also containing the target. Traditional Sanger sequencing
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....

 or cheaper, more high-throughput technologies such as SOLiD
ABI Solid Sequencing
SOLiD is a next-generation sequencing technology developed by Life Technologies and has been commercially available since 2008. These next generation technologies generate hundreds of millions to billions of small sequence reads at one time...

, Illumina
Illumina (company)
Illumina, Inc. is a company incorporated in April 1998 that develops, manufactures and markets integrated systems for the analysis of genetic variation and biological function. Using its technologies, the company provides a line of products and services that serve the sequencing, genotyping and...

 or Roche 454
454 Life Sciences
454 Life Sciences, is a biotechnology company based in Branford, Connecticut. It is a subsidiary of Roche, and specializes in high-throughput DNA sequencing.-History and Major Achievements:...

 can be used for this purpose.
Multiplex analysis

Although each probe examines one specific genomic locus, multiple probes can be combined into a single tube for multiplexed assay that simultaneously examines multiple loci. Currently, multiplexed MIP analysis can examine more than 55,000 loci in a single assay .

Technique Development History

Padlock Probe

The design of the molecular inversion probes (MIP) originated from padlock probes, a molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...

 technique first reported by Nilsson et al. in 1994
.
Similar to MIP, padlock probes are single stranded DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 molecules with two 20-nucleotide long segments complementary to the target connected by a 40-nucleotide long linker sequence. When the target complementary regions are hybridized to the DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 target, the padlock probes also become circularized. However, unlike MIP, padlock probes are designed such that the target complementary regions span the entire target region upon hybridization, leaving no gaps. Thus, padlock probes are only useful for detecting DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 molecules with known sequences.

Nilsson et al. demonstrated the use of padlock probes to detect numerous DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 targets, including a synthetic oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...

 and a circular genomic clone. Padlock probes have high specificity towards their target and can distinguish target molecules that closely resemble one another. Nilsson et al. also demonstrated the use of padlock probes to differentiate between a normal and a mutant cystic fibrosis
Cystic fibrosis
Cystic fibrosis is a recessive genetic disease affecting most critically the lungs, and also the pancreas, liver, and intestine...

 conductance receptor
Receptor (biochemistry)
In biochemistry, a receptor is a molecule found on the surface of a cell, which receives specific chemical signals from neighbouring cells or the wider environment within an organism...

 (CFCR) where the CFCR mutant had a 3bp
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...

 deletion corresponding to one of the ends of the probe. Since ligation
Ligation
Ligation may refer to:* In molecular biology, the covalent linking of two ends of DNA molecules using DNA ligase* In medicine, the making of a ligature * Chemical ligation, the production of peptides from amino acids...

 requires the ends of the probe to be immediately adjacent to one another when hybridized to the target, the 3bp
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...

 deletion in the mutant prevented successful ligation. Padlock probes were also successfully used for in situ hybridization
In situ hybridization
In situ hybridization is a type of hybridization that uses a labeled complementary DNA or RNA strand to localize a specific DNA or RNA sequence in a portion or section of tissue , or, if the tissue is small enough , in the entire tissue...

 to detect alphoid repeats specific to chromosome 12 in a sample of chromosomes in metastasis
Metastasis
Metastasis, or metastatic disease , is the spread of a disease from one organ or part to another non-adjacent organ or part. It was previously thought that only malignant tumor cells and infections have the capacity to metastasize; however, this is being reconsidered due to new research...

 state. Here, traditional, linear oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...

 probes failed to yield results . Thus, padlock probes possess sufficient specificity to detect single copy elements in the genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

 .

Molecular Inversion Probe

In order to perform SNP genotyping
SNP genotyping
SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation...

, Hardenbol et al. modified padlock probes such that when the probe is hybridized to the genomic target, there is a gap at the SNP position. Gap filling using a nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

 that is complementary to the nucleotide at the SNP location determines the identity of the polymorphism. This design brings numerous benefits over the more traditional padlock probe technique. Using multiple padlock probes specific to a plausible SNP requires careful balancing of the concentration of these allele
Allele
An allele is one of two or more forms of a gene or a genetic locus . "Allel" is an abbreviation of allelomorph. Sometimes, different alleles can result in different observable phenotypic traits, such as different pigmentation...

 specific probes to ensure SNP counts at a given locus are properly normalized . In addition, with this design, bad probes affect all genotypes at a given locus equally . For instance, since MIP probes can assay multiple genotypes at a particular genomic locus, if the probe for a given locus does not work (e.g. fails to properly hybridize to the genomic target), none of the genotypes at this locus will be detected. In contrast, for padlock probes, one needs to design a distinct padlock probe to detect each plausible genotype a given locus (e.g. one padlock probe is needed for detecting "A" at a given SNP locus and another padlock probe is needed for detecting "T" at the locus). Thus, a bad padlock probe will only affect the detection of the specific genotype that the probe is designed to detect whereas a bad MIP probe will affect all genotypes at the locus. Using MIP, one avoids potential incorrect SNP calling since if the probe designed to assay a given locus does not work, no data is generated for this locus and no SNP calling is performed.

In their procedure, Hardenbol et al. assayed more than 1000 SNP loci simultaneously in a single tube where the tube contained more than 1000 probes with distinct designs. The pool of probes was aliquoted into four tubes for four different reactions. In each reaction, a distinct nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

 (A, T, C or G) was used for gap filling. Only when the nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

 at the SNP locus was complementary to the applied nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

 would the gap be closed by ligation
Ligase
In biochemistry, ligase is an enzyme that can catalyse the joining of two large molecules by forming a new chemical bond, usually with accompanying hydrolysis of a small chemical group dependent to one of the larger molecules...

 and the probe be circularized. Identification of the captured SNPs was performed on genotyping
Genotyping
Genotyping is the process of determining differences in the genetic make-up of an individual by examining the individual's DNA sequence using biological assays and comparing it to another individual's sequence or a reference sequence. It reveals the alleles an individual has inherited from their...

 arrays where each spot on the array contained sequences complementary to the locus-specific tags in the probes.
Since the DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 array costs is a major contributor to the cost of this technique, the performance of four-chip
DNA microarray
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

-one-color detection was compared to two-chip
DNA microarray
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

-two color detection. The results were found to be similar in terms of SNP call rate and signal-to-noise ratio .

In a recent report, this group successfully increased the level of multiplexing to simultaneously assay more than 10,000 SNP loci, using 12,000 distinct probes. The study examined SNP polymorphisms in 30 trio samples (each trio consisted of a mother, father and their child). Knowing the genotypes of the parents, the accuracy of the SNP genotypes predicted in the child was determined by examining whether a concordance existed between the expected Mendelian inheritance patterns and the predicted genotypes. Trio concordance rate has been found to be > 99.6%. In addition, a set of MIP-specific performance metrics was developed. This work set the framework for high-throughput SNP genotyping
SNP genotyping
SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation...

 in the HapMap project .

Connector Inversion Probe

To capture longer genomic regions than a single nucleotide, Akhras et al. modified the design of MIP by extending the gap delimited by the hybridized probe ends and named the design Connector Inversion Probe (CIP). The gap corresponds to the genomic region of interest to be captured (e.g. exon
Exon
An exon is a nucleic acid sequence that is represented in the mature form of an RNA molecule either after portions of a precursor RNA have been removed by cis-splicing or when two or more precursor RNA molecules have been ligated by trans-splicing. The mature RNA molecule can be a messenger RNA...

s). Gap filling reaction is achieved with DNA polymerase
DNA polymerase
A DNA polymerase is an enzyme that helps catalyze in the polymerization of deoxyribonucleotides into a DNA strand. DNA polymerases are best known for their feedback role in DNA replication, in which the polymerase "reads" an intact DNA strand as a template and uses it to synthesize the new strand....

, using all four nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

s. Identification of the captured regions can then be done by sequencing them using locus-specific primers that map to one of the target complementary ends of the probes.

Akhras et al. also developed the multiplexing multiplex padlocks (MMP) barcode system in order to lower the costs of reagents. A single assay might involve DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 samples from multiple individuals and examine multiple genomic loci in each individual. A DNA barcode system that uniquely identifies each plausible combination of individual and genomic locus
Locus (genetics)
In the fields of genetics and genetic computation, a locus is the specific location of a gene or DNA sequence on a chromosome. A variant of the DNA sequence at a given locus is called an allele. The ordered list of loci known for a particular genome is called a genetic map...

 is represented as DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 tags that were inserted into the linker region of the probes. Thus, sequences from the captured regions would include the barcode, allowing the non-ambiguous determination of the individual and the genomic locus that the captured region belongs to.

This group has also developed a software for designing locus-specific CIPs (CIP creator 1.0.1).

Application

Molecular Inversion Probe (MIP) is one of the techniques widely used to capture a small region of the genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

 for further examination. With the invention of the next generation sequencing technologies, the cost of sequencing whole genomes has decreased dramatically, however the cost is still too high for these sequencing machines to be used in practice in every laboratory. Instead, different genome partitioning techniques can be used to isolate smaller but highly specific regions of the genome for further analysis. MIP, for instance, can be used to capture targets for SNPgenotyping
Genotyping
Genotyping is the process of determining differences in the genetic make-up of an individual by examining the individual's DNA sequence using biological assays and comparing it to another individual's sequence or a reference sequence. It reveals the alleles an individual has inherited from their...

, copy number variation or allelic imbalance studies, to name a few.

SNP Genotyping

In SNP genotyping
Genotyping
Genotyping is the process of determining differences in the genetic make-up of an individual by examining the individual's DNA sequence using biological assays and comparing it to another individual's sequence or a reference sequence. It reveals the alleles an individual has inherited from their...

, the probes are separated into four reactions and a different type of nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...

 is added to each reaction. If the SNP at the target region is complementary to the added nucleotide, the ligation is successful and the probe becomes fully circularized. Since each probe hybridizes to exactly one SNP target in the genome, successfully circularized probes provide the nucleotide identities of the SNPs. The tag sequences from the four nucleotide-specific reactions are then hybridized to either four genotyping arrays or two, dual-colour arrays (one channel for each reaction). Analyzing which spots on the array
DNA microarray
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

 are bound by the tags allows the determination of the SNP identities at the genomic loci represented by those tags.

The SNPs targeted by MIP can then be used in areas of research such as quantitative trait loci
Quantitative trait locus
Quantitative traits refer to phenotypes that vary in degree and can be attributed to polygenic effects, i.e., product of two or more genes, and their environment. Quantitative trait loci are stretches of DNA containing or linked to the genes that underlie a quantitative trait...

 (QTL) analysis or genome-wide association studies
Genome-wide association study
In genetic epidemiology, a genome-wide association study , also known as whole genome association study , is an examination of many common genetic variants in different individuals to see if any variant is associated with a trait...

 (GWAS) where the SNPs are used in either indirect linkage disequilibrium
Linkage disequilibrium
In population genetics, linkage disequilibrium is the non-random association of alleles at two or more loci, not necessarily on the same chromosome. It is also referred to as to as gametic phase disequilibrium , or simply gametic disequilibrium...

 studies or directly screened for causative mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...

s.

Copy Number Variation Detection

Molecular inversion probe technique can also be used for copy number variation (CNV) detection. This dual role in SNP genotyping
Genotyping
Genotyping is the process of determining differences in the genetic make-up of an individual by examining the individual's DNA sequence using biological assays and comparing it to another individual's sequence or a reference sequence. It reveals the alleles an individual has inherited from their...

 as well as CNV analysis of MIP is similar to the high-density SNP genotyping arrays
SNP array
In molecular biology and bioinformatics, a SNP array is a type of DNA microarray which is used to detect polymorphisms within a population. A single nucleotide polymorphism , a variation at a single site in DNA, is the most frequent type of variation in the genome. For example, there are around 10...

 which have recently been used for CNV detection and analysis as well. These techniques extract the allele-specific signal intensities from genotyping data and use that to generate CNV results. These techniques have higher precision and resolution than traditional techniques such as G-banded karyotypic
Karyotype
A karyotype is the number and appearance of chromosomes in the nucleus of an eukaryotic cell. The term is also used for the complete set of chromosomes in a species, or an individual organism.p28...

 analyses, fluorescence in situ hybridization (FISH) or array comparative genomic hybridization
Array comparative genomic hybridization
Array-comparative genomic hybridization is a technique to detect genomic copy number variations at a higher resolution level than chromosome-based comparative genomic hybridization .-Process:DNA from...

 (aCGH).

Current Research

MIP has been used extensively in many areas of research; some of the examples of the use of this technique in recent literature are outlined below:
  • Molecular inversion probe technique has been used in studying childhood brain tumors, the most common solid pediatric cancer and the leading cause of pediatric cancer mortality. Despite their high prevalence, little is known about the genetic events that contribute to the development and progression of pediatric glioma
    Glioma
    A glioma is a type of tumor that starts in the brain or spine. It is called a glioma because it arises from glial cells. The most common site of gliomas is the brain.-By type of cell:...

    s. MIP has identified novel areas of copy number events in this cancer using minimal DNA
    DNA
    Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

    . Identification of these events can in return lead to the understanding of the underlying mechanism of this disease.

  • 45 pediatric leukemia
    Leukemia
    Leukemia or leukaemia is a type of cancer of the blood or bone marrow characterized by an abnormal increase of immature white blood cells called "blasts". Leukemia is a broad term covering a spectrum of diseases...

     samples were analyzed for gene copy aberrations using molecular inversion probe technology. The MIP analysis identified 69 regions of recurring copy number changes, of which 41 have not been identified with other DNA microarray platforms. Copy number gains and losses were validated in 98% of clinical karyotype
    Karyotype
    A karyotype is the number and appearance of chromosomes in the nucleus of an eukaryotic cell. The term is also used for the complete set of chromosomes in a species, or an individual organism.p28...

    s and 100% of fluorescence in situ hybridization studies available.

  • In another study, the MIP was used to identify the association between the polymorphisms and haplotypes in the caspase
    Caspase
    Caspases, or cysteine-aspartic proteases or cysteine-dependent aspartate-directed proteases are a family of cysteine proteases that play essential roles in apoptosis , necrosis, and inflammation....

    -3, caspase
    Caspase
    Caspases, or cysteine-aspartic proteases or cysteine-dependent aspartate-directed proteases are a family of cysteine proteases that play essential roles in apoptosis , necrosis, and inflammation....

    -7, and caspase
    Caspase
    Caspases, or cysteine-aspartic proteases or cysteine-dependent aspartate-directed proteases are a family of cysteine proteases that play essential roles in apoptosis , necrosis, and inflammation....

    -8 genes and the risk for endometrial
    Endometrium
    -Function:The endometrium is the innermost glandular layer and functions as a lining for the uterus, preventing adhesions between the opposed walls of the myometrium, thereby maintaining the patency of the uterine cavity. During the menstrual cycle or estrous cycle, the endometrium grows to a...

     cancer.

  • A recent study has demonstrated the success of MIP for copy number variation and genotyping
    Genotyping
    Genotyping is the process of determining differences in the genetic make-up of an individual by examining the individual's DNA sequence using biological assays and comparing it to another individual's sequence or a reference sequence. It reveals the alleles an individual has inherited from their...

     studies in formalin-fixed paraffin embedded samples. These banked samples, usually with extensive follow-up information, underperform or suffer high failure rates compared to fresh frozen samples because of DNA degradation and cross-linking during the fixation
    Fixation (histology)
    In the fields of histology, pathology, and cell biology, fixation is a chemical process by which biological tissues are preserved from decay, thereby preventing autolysis or putrefaction...

     and processing. The study, however, successfully applied MIP to obtain high quality copy number and genotyping
    Genotyping
    Genotyping is the process of determining differences in the genetic make-up of an individual by examining the individual's DNA sequence using biological assays and comparing it to another individual's sequence or a reference sequence. It reveals the alleles an individual has inherited from their...

     data from formalin-fixed paraffin embedded samples.

  • Molecular inversion probe technique has also been used in the field of pharmacogenomics
    Pharmacogenomics
    Pharmacogenomics is the branch of pharmacology which deals with the influence of genetic variation on drug response in patients by correlating gene expression or single-nucleotide polymorphisms with a drug's efficacy or toxicity...

    . Genotyping
    Genotyping
    Genotyping is the process of determining differences in the genetic make-up of an individual by examining the individual's DNA sequence using biological assays and comparing it to another individual's sequence or a reference sequence. It reveals the alleles an individual has inherited from their...

     of genes important in drug metabolism, excretion and transport using MIP has paved the way in understanding the patient-to-patient variability in responses to drugs.

Probe Design Optimization Strategies

To optimize the degree of multiplexing and the lengths of the captured regions, a number of factors should be considered when designing probes :
  • The sequences of the probe that are complementary to the DNA target must be specific and map only to unique regions with reasonable sequence complexity in the genome . Genomic regions containing repeats should be treated with caution.

  • For all probes used in a single assay, the annealing temperatures of the two target complementary ends of the probes should be similar such that hybridization of the two ends to their targets can be achieved at the same temperature.

  • The GC content of the genomic targets should be similar and the targets lengths variability should be restricted such that all gaps can be filled under similar elongation timeframes .

  • The lengths of the genomic targets cannot be too long (current successful applications worked with 100 to 200bp target lengths), otherwise steric effects may interfere with successful hybridization of the probes to their targets .

  • The tags from each probe used for microarray
    DNA microarray
    A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

    -based captured region identification should have similar melting temperatures as well as maximal orthogonal base complexities. These ensure that all tags can be hybridized to the array under similar conditions and that cross-hybridizations are minimized, respectively.

MIP Protocol Optimization Strategies

A number of experimental conditions can be modified for optimization, these include:
  • Hybridization and gap-fill time
  • Probes, Ligase
    Ligase
    In biochemistry, ligase is an enzyme that can catalyse the joining of two large molecules by forming a new chemical bond, usually with accompanying hydrolysis of a small chemical group dependent to one of the larger molecules...

     and DNA polymerase
    DNA polymerase
    A DNA polymerase is an enzyme that helps catalyze in the polymerization of deoxyribonucleotides into a DNA strand. DNA polymerases are best known for their feedback role in DNA replication, in which the polymerase "reads" an intact DNA strand as a template and uses it to synthesize the new strand....

     concentrations
  • Enrichment of the captured target by either rolling circle amplification
    Rolling circle replication
    Rolling circle replication describes a process of unidirectional nucleic acid replication that can rapidly synthesize multiple copies of circular molecules of DNA or RNA, such as plasmids, the genomes of bacteriophages, and the circular RNA genome of viroids...

     or linearizing the probes to perform multi-template PCR using the universal primers, common for all probes
  • Captured target identification via either array
    DNA microarray
    A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

    -based hybridization approaches or direct sequencing of the target


These factors are critical since in one study, proper optimization strategies increased target capture efficiency from 18 to 91 percent .

Performance Metrics

Turner et al. 2009 summarized two metrics that are commonly reported in MIP-based genomic capture experiments that identify the target by sequencing.
  • Capture Uniformity: analogous to recall
    Precision and recall
    In pattern recognition and information retrieval, precision is the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance...

     – the fraction of genomic targets that are captured with confidence. Specifically, the relative abundance of sequence reads that are mapped to each genomic target.

  • Capture Specificity: analogous to precision
    Precision and recall
    In pattern recognition and information retrieval, precision is the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance...

     – the fraction of sequence reads that actually map to the genomic targets of interest.


These two metrics are directly affected by the quality of the batch of probes.
To improve the results for low quality probes, higher levels of sequencing depths can be performed. The amount of sequencing scales needed nearly exponentially with decreases in uniformity and specificity
Sensitivity and specificity
Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function. Sensitivity measures the proportion of actual positives which are correctly identified as such Sensitivity and specificity are statistical...

.

Hardenbol et al. 2005 proposed a set of metrics that concern SNP genotyping using MIPs.
  • Single/noise ratio: Ratio of true genotype counts over background counts
  • Probe conversion rate: Number of genomic SNP loci for which probes can be designed and successfully assayed. In other words, this metric concerns the fraction of probes that produce genotyping results.
  • Call rate: For a given SNP locus, the number of DNA samples whose genotypes at this locus can be measured. In other words, the number of supporting evidence for the genotype(s) assigned to the given SNP locus.
  • Completeness: For the set of SNPs assayed, the total fraction of genotypes that are successfully obtained.
  • Accuracy: For the set of SNPs assayed, the fraction detected genotypes that are correct. This is commonly measured by the repeatability of the results.


An inherent trade-off exists between probe conversion rate and accuracy
Accuracy and precision
In the fields of science, engineering, industry and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity's actual value. The precision of a measurement system, also called reproducibility or repeatability, is the degree to which...

. Removing probes that yielded incorrect genotypes increases the accuracy but decreases the probe conversion rate. In contrast, using a lenient probe acceptance threshold increases probe conversion rate but decreases the accuracy
Accuracy and precision
In the fields of science, engineering, industry and statistics, the accuracy of a measurement system is the degree of closeness of measurements of a quantity to that quantity's actual value. The precision of a measurement system, also called reproducibility or repeatability, is the degree to which...

.

Other Genomic Partitioning Techniques

To reduce the costs from sequencing whole genomes, many methods that enrich specific genomic regions of interest have been proposed.
Technique Details Multiplex Levela
Multiplex PCR Target enrichment by PCR amplication of the genomic targets using multiple target-specific primer
Primer (molecular biology)
A primer is a strand of nucleic acid that serves as a starting point for DNA synthesis. They are required for DNA replication because the enzymes that catalyze this process, DNA polymerases, can only add new nucleotides to an existing strand of DNA...

 sets
102 - 103
Capture by Circularization Target capture using probes containing sequences complementary to the target
Hybridization of the probes to their targets results in circularized products
Target enrichment via rolling circle amplification or PCR using universal primers
Primer (molecular biology)
A primer is a strand of nucleic acid that serves as a starting point for DNA synthesis. They are required for DNA replication because the enzymes that catalyze this process, DNA polymerases, can only add new nucleotides to an existing strand of DNA...

104 - 105
Solution-based Capture Genomic DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 shotgun fragments in solution captured by biotinylated
Biotinylation
In biochemistry, biotinylation is the process of covalently attaching biotin to a protein, nucleic acid or other molecule. Biotinylation is rapid, specific and is unlikely to perturb the natural function of the molecule due to the small size of biotin...

 probes with
sequences complementary to the desired targets
104 - 105
Array-based Capture Genomic DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 shotgun fragments captured on microarray
DNA microarray
A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

 containing spots with sequences complementary to the desired targets
105 - 106
aThe number of genomic loci that can be assayed in a single run

Other Capture by Circularization Methods

Gene selector method: An initial multiplex PCR step is performed to enrich the targets of interest. The PCR products are circularized upon hybridization to target-specific probes with sequences complementary to the two primers used in the PCR step.

Capture by selective circularization method: The genomic DNA is digested into fragments with restriction enzymes. Using selector probes with flanking regions that are complementary to the target of interest, the digested DNA fragments are circularized upon hybridization to the selector probes.

Performance Comparisons between Genomic Partitioning Techniques

Each method demonstrates trade offs between uniformity, capture specificity, cost, scalability and availability.
  • In terms of capture specificity
    Sensitivity and specificity
    Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as classification function. Sensitivity measures the proportion of actual positives which are correctly identified as such Sensitivity and specificity are statistical...

    , Capture by Circularization methods demonstrate the best results. This is due to the fact that all methods in this class require two ends of the same DNA
    DNA
    Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

     molecule (e.g. two ends of MIP probes) to simultaneously bind to a single cognate partner molecule (e.g. genomic target region) in the proper configuration for successful ligation
    Ligation
    Ligation may refer to:* In molecular biology, the covalent linking of two ends of DNA molecules using DNA ligase* In medicine, the making of a ligature * Chemical ligation, the production of peptides from amino acids...

    .

  • In contrast, Capture by Circularization methods demonstrate less uniformity compared to other methods. This is because the probe design for each distinct genomic target is unique and thus the performance between individual probes may vary.

  • Regarding scalability, high specificity of Capture by Circularization and Solution-based Capture methods make them the most appropriate for studies which involve large number of genomic targets and many samples. Array
    DNA microarray
    A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

    -based Capture techniques are appropriate for studying many genomic targets but with fewer samples due to limited resolution and specificity of microarrays
    DNA microarray
    A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

    . Multiplex
    Multiplex (assay)
    A multiplex assay is a type of laboratory procedure that simultaneously measures multiple analytes in a single assay. It is distinguished from procedures that measure one or a few analytes at a time...

     PCR methods are most appropriate for small-scale studies due to it ease of use and availability of reagents.

  • The costs associated with each technique are difficult to compare given the vast choices of designs and experimental conditions. However, for every technique, attaining a high multiplexing
    Multiplex (assay)
    A multiplex assay is a type of laboratory procedure that simultaneously measures multiple analytes in a single assay. It is distinguished from procedures that measure one or a few analytes at a time...

     level where many loci are assayed simultaneously amortizes the costs.

Advantages of MIP

  • Unlike some of the other genotyping
    SNP genotyping
    SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation...

     techniques, the need to PCR amplify the DNA
    DNA
    Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

     sample prior to MIP application is eliminated. This is beneficial when examining a large number of target sequences simultaneously when cross-talk between primer pairs is likely to happen
  • High specificity: High specificity is achieved by that
    i) Unlike other highly multiplexed genotyping
    SNP genotyping
    SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation...

     techniques, MIP utilizes enzymatic steps (DNA polymerization and ligation) in solution to capture specific loci, which is then followed by an amplification step. Such a combination of enzymatic steps confers a high degree of specificity on the MIP assay
    ii) Exonuclease
    Exonuclease
    Exonucleases are enzymes that work by cleaving nucleotides one at a time from the end of a polynucleotide chain. A hydrolyzing reaction that breaks phosphodiester bonds at either the 3’ or the 5’ end occurs. Its close relative is the endonuclease, which cleaves phosphodiester bonds in the middle ...

     treatment removes non-reacted, linear probes
    iii) The tag sequences are selected in a way to increase specificity at hybridization and thus prevent cross-talk at the detection step
    iv) Target complementary sequences at both ends of the probe are physically limited to interact locally
  • Built-in quality control of the signal to noise ratio: the MIP technique examines the possibility of all four bases for each SNP position. A homozygous
    Zygosity
    Zygosity refers to the similarity of alleles for a trait in an organism. If both alleles are the same, the organism is homozygous for the trait. If both alleles are different, the organism is heterozygous for that trait...

     SNP is expected to have a single signal and a heterozygous
    Zygosity
    Zygosity refers to the similarity of alleles for a trait in an organism. If both alleles are the same, the organism is homozygous for the trait. If both alleles are different, the organism is heterozygous for that trait...

     SNP to have two signals. Thus, the signal to noise ratio can be monitored using the background alleles and if a call has a suspicious signal, it can be discarded from the downstream analysis
  • High levels of multiplexing (on the order of 104-105 probes in one assay) can be achieved
  • Low amount of sample DNA
    DNA
    Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

    (e.g. 0.2 ng/SNP ) is needed since the MIP probes can be applied directly to genomic DNA instead of shotgun libraries
  • High concordance: trio concordance rate is found to be > 99.6%
  • Reproducibility: genotyping the same individual several times showed that the genotyped SNPs were concordant (99.9%)
  • High dynamic range: in CNV detection studies, up to 60 copies of amplified regions can be detected in the genome
    Genome
    In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

  • Since MIP requires only 40 base-pairs of intact genomic DNA, its use in degraded samples, such as formaldehyde fixed paraffin embedded samples, may offer distinct advantages
  • Simple infrastructure (only common bench-top reagents and tools are required) and simple design make this technique broadly applicable in many laboratories
  • The choices of the platform for identifying the captured target are very flexible such that cost-efficiency may be improved. For instance, the captured targets can be directly sequenced, bypassing the need for sequencing library
    DNA sequencing
    DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....

     construction.

Limitations of MIP

  • Sensitivity and uniformity are relatively low compared to other genomic capture techniques since not all targets can be captured under the same experimental conditions for high-throughput runs that involve multiple probes. However, a recent study that used probes with longer linker regions improved uniformity .
  • The plausible sizes of the target that can be captured are limited since
    i) Large gap region leads to steric constraints for the intramolecular circularization of the probe and
    ii) Large gap requires longer probes be synthesized, increasing the costs.
  • The degree of multiplexing is constrained by the multiplexing capability of the method chosen for target identification. If array
    DNA microarray
    A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

    -based detection methods are used, the number of targets that can be assayed is limited by the available spots on the array.
  • Since a distinct probe is needed to capture each region, it is costly to assay many regions. However, with multiplexity, the costs are amortized. For instance, at a multiplexity level of 1000, the costs become $0.01 per probe for each assay .
  • MIP reaction conditions may require optimization, which is particularly important for assaying heterozygotic sites .

See also

  • International HapMap Project
    International HapMap Project
    The International HapMap Project is an organization that aims to develop a haplotype map of the human genome, which will describe the common patterns of human genetic variation. HapMap is a key resource for researchers to find genetic variants affecting health, disease and responses to drugs and...

  • Exome
    Exome
    The exome is the part of the genome formed by exons, coding portions of genes in the genome that are expressed, that is, provide the genetic blueprint used in the synthesis of proteins and other functional gene products...

  • Exon trapping
    Exon trapping
    Exon trapping is a molecular biology technique to identify potential exons in a fragment of eukaryote DNA of unknown intron-exon structure. This is done to determine if the fragment is part of an expressed gene....

  • Polymerase chain reaction
    Polymerase chain reaction
    The polymerase chain reaction is a scientific technique in molecular biology to amplify a single or a few copies of a piece of DNA across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence....

  • Rolling circle replication
    Rolling circle replication
    Rolling circle replication describes a process of unidirectional nucleic acid replication that can rapidly synthesize multiple copies of circular molecules of DNA or RNA, such as plasmids, the genomes of bacteriophages, and the circular RNA genome of viroids...

  • DNA microarray
    DNA microarray
    A DNA microarray is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome...

  • DNA sequencing
    DNA sequencing
    DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK