Chemometrics
Encyclopedia
Chemometrics is the science of extracting information from chemical systems by data-driven means. It is a highly interfacial discipline, using methods frequently employed in core data-analytic disciplines such as multivariate statistics
Multivariate statistics
Multivariate statistics is a form of statistics encompassing the simultaneous observation and analysis of more than one statistical variable. The application of multivariate statistics is multivariate analysis...

, applied mathematics
Applied mathematics
Applied mathematics is a branch of mathematics that concerns itself with mathematical methods that are typically used in science, engineering, business, and industry. Thus, "applied mathematics" is a mathematical science with specialized knowledge...

, and computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, in order to address problems in chemistry
Chemistry
Chemistry is the science of matter, especially its chemical reactions, but also its composition, structure and properties. Chemistry is concerned with atoms and their interactions with other atoms, and particularly with the properties of chemical bonds....

, biochemistry
Biochemistry
Biochemistry, sometimes called biological chemistry, is the study of chemical processes in living organisms, including, but not limited to, living matter. Biochemistry governs all living organisms and living processes...

, medicine
Medicine
Medicine is the science and art of healing. It encompasses a variety of health care practices evolved to maintain and restore health by the prevention and treatment of illness....

, biology
Biology
Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy. Biology is a vast subject containing many subdivisions, topics, and disciplines...

 and chemical engineering
Chemical engineering
Chemical engineering is the branch of engineering that deals with physical science , and life sciences with mathematics and economics, to the process of converting raw materials or chemicals into more useful or valuable forms...

. In this way, it mirrors several other interfacial ‘-metrics’ such as psychometrics
Psychometrics
Psychometrics is the field of study concerned with the theory and technique of psychological measurement, which includes the measurement of knowledge, abilities, attitudes, personality traits, and educational measurement...

 and econometrics
Econometrics
Econometrics has been defined as "the application of mathematics and statistical methods to economic data" and described as the branch of economics "that aims to give empirical content to economic relations." More precisely, it is "the quantitative analysis of actual economic phenomena based on...

.

Introduction

Chemometrics is applied to solve both descriptive and predictive problems in experimental life sciences, especially in chemistry. In descriptive applications, properties of chemical systems are modeled with the intent of learning the underlying relationships and structure of the system (i.e., model understanding and identification). In predictive applications, properties of chemical systems are modeled with the intent of predicting new properties or behavior of interest. In both cases, the datasets can be small but are often very large and highly complex, involving hundreds to thousands of variables, and hundreds to thousands of cases or observations.

Chemometric techniques are particularly heavily used in analytical chemistry
Analytical chemistry
Analytical chemistry is the study of the separation, identification, and quantification of the chemical components of natural and artificial materials. Qualitative analysis gives an indication of the identity of the chemical species in the sample and quantitative analysis determines the amount of...

 and metabolomics
Metabolomics
Metabolomics is the scientific study of chemical processes involving metabolites. Specifically, metabolomics is the "systematic study of the unique chemical fingerprints that specific cellular processes leave behind", the study of their small-molecule metabolite profiles...

, and the development of improved chemometric methods of analysis also continues to advance the state of the art in analytical instrumentation and methodology. It is an application driven discipline, and thus while the standard chemometric methodologies are very widely used industrially, academic groups are dedicated to the continued development of chemometric theory, method and application development.

Origins

Although one could argue that even the earliest analytical experiments in chemistry involved a form of chemometrics, the field is generally recognized to have emerged in the 1970s as computers became increasingly exploited for scientific investigation. The term ‘chemometrics’ was coined by Svante Wold in a grant application 1971, and the International Chemometrics Society was formed shortly thereafter by Svante Wold and Bruce Kowalski, two pioneers in the field. Wold was a professor of organic chemistry at Umeå University, Sweden, and Kowalski was a professor of analytical chemistry at University of Washington,Seattle.

Many early applications involved multivariate classification, numerous quantitative predictive applications followed, and by the late 1970s and early 1980s a wide variety of data- and computer-driven chemical analyses were occurring.

Multivariate analysis was a critical facet even in the earliest applications of chemometrics. The data resulting from infrared and UV/visible spectroscopy are often easily numbering in the thousands of measurements per sample. Mass spectrometry, nuclear magnetic resonance, atomic emission/absorption and chromatography experiments are also all by nature highly multivariate. The structure of these data was found to be conducive to using techniques such as principal components analysis
Principal components analysis
Principal component analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to...

 (PCA), and partial least-squares
Partial least squares regression
Partial least squares regression is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a linear regression model by projecting the predicted variables and the...

 (PLS). This is primarily because, while the datasets may be highly multivariate there is strong and often linear low-rank structure present. PCA and PLS have been shown over time very effective at empirically modeling the more chemically interesting low-rank structure, exploiting the interrelationships or ‘latent variables’ in the data, and providing alternative compact coordinate systems for further numerical analysis such as regression
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

, clustering, and pattern recognition
Pattern recognition
In machine learning, pattern recognition is the assignment of some sort of output value to a given input value , according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes...

. Partial least squares in particular was heavily used in chemometric applications for many years before it began to find regular use in other fields.

Through the 1980s three dedicated journals appeared in the field: Journal of Chemometrics
Journal of Chemometrics
The Journal of Chemometrics is a peer-reviewed scientific journal published since 1987 by John Wiley & Sons. In twelve issues per year, it publishes original scientific papers, reviews, and short communications on fundamental and applied aspects of chemometrics.The current Editor-in-Chief is...

 (from John Wiley & Sons
John Wiley & Sons
John Wiley & Sons, Inc., also referred to as Wiley, is a global publishing company that specializes in academic publishing and markets its products to professionals and consumers, students and instructors in higher education, and researchers and practitioners in scientific, technical, medical, and...

),
Chemometrics and Intelligent Laboratory Systems
Chemometrics and intelligent laboratory systems
Chemometrics and Intelligent Laboratory Systems is a peer-reviewed scientific journal sponsored by the Chemometrics Society and published since 1986 by Elsevier. The current editor-in-chief is R...

 (from Elsevier
Elsevier
Elsevier is a publishing company which publishes medical and scientific literature. It is a part of the Reed Elsevier group. Based in Amsterdam, the company has operations in the United Kingdom, USA and elsewhere....

 http://www.elsevier.com/wps/find/journaldescription.cws_home/502682/description#description), and Journal of Chemical Information and Modeling
Journal of Chemical Information and Modeling
The Journal of Chemical Information and Modeling , is a peer-reviewed scientific journal, published since 1961 by the American Chemical Society...

 (from the American Chemical Society
American Chemical Society
The American Chemical Society is a scientific society based in the United States that supports scientific inquiry in the field of chemistry. Founded in 1876 at New York University, the ACS currently has more than 161,000 members at all degree-levels and in all fields of chemistry, chemical...

).
These journals continue to cover both fundamental and methodological research in chemometrics. At present, most routine applications of existing chemometric methods are commonly published in application-oriented journals (e.g., Applied Spectroscopy, Analytical Chemistry
Analytical Chemistry (journal)
Analytical Chemistry is a peer-reviewed scientific journal, published since 1929 by the American Chemical Society. It is currently indexed/abstracted in: Chemical Abstracts Service, CABI, EBSCOhost, ProQuest, PubMed, Scopus, and Web of Science...

, Anal. Chim. Acta., Talanta).
Several important books/monographs on chemometrics were also first published in the 1980s, including the first edition of Malinowski’s “Factor Analysis in Chemistry”, Sharaf, Illman and Kowalski’s “Chemometrics”, Massart et. al. “Chemometrics: a textbook”, and “Multivariate Calibration” by Martens and Naes.

Some large chemometric application areas have gone on to represent new domains, such as molecular modeling and QSAR, cheminformatics
Cheminformatics
Cheminformatics is the use of computer and informational techniques, applied to a range of problems in the field of chemistry. These in silico techniques are used in pharmaceutical companies in the process of drug discovery...

, the ‘-omics’ fields of genomics
Genomics
Genomics is a discipline in genetics concerning the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis,...

, proteomics
Proteomics
Proteomics is the large-scale study of proteins, particularly their structures and functions. Proteins are vital parts of living organisms, as they are the main components of the physiological metabolic pathways of cells. The term "proteomics" was first coined in 1997 to make an analogy with...

, metabonomics and metabolomics
Metabolomics
Metabolomics is the scientific study of chemical processes involving metabolites. Specifically, metabolomics is the "systematic study of the unique chemical fingerprints that specific cellular processes leave behind", the study of their small-molecule metabolite profiles...

, process modeling and process analytical technology
Process Analytical Technology
Process analytical technology has been defined by the United States Food and Drug Administration as a mechanism to design, analyze, and control pharmaceutical manufacturing processes through the measurement of Critical Process Parameters which affect Critical Quality Attributes .The concept...

.

An account of the early history of chemometrics was published as a series of interviews by Geladi and Esbensen.

Multivariate calibration

Many chemical problems and applications of chemometrics involve calibration
Calibration
Calibration is a comparison between measurements – one of known magnitude or correctness made or set with one device and another measurement made in as similar a way as possible with a second device....

. The objective is develop models which can be used to predict properties of interest based on measured properties of the chemical system, such as pressure, flow, temperature, infrared
Infrared spectroscopy
Infrared spectroscopy is the spectroscopy that deals with the infrared region of the electromagnetic spectrum, that is light with a longer wavelength and lower frequency than visible light. It covers a range of techniques, mostly based on absorption spectroscopy. As with all spectroscopic...

, Raman
Raman spectroscopy
Raman spectroscopy is a spectroscopic technique used to study vibrational, rotational, and other low-frequency modes in a system.It relies on inelastic scattering, or Raman scattering, of monochromatic light, usually from a laser in the visible, near infrared, or near ultraviolet range...

, NMR spectra
NMR
NMR may refer to:Applications of Nuclear Magnetic Resonance:* Nuclear magnetic resonance* NMR spectroscopy* Solid-state nuclear magnetic resonance* Protein nuclear magnetic resonance spectroscopy* Proton NMR* Carbon-13 NMR...

 and mass spectra
Mass spectrometry
Mass spectrometry is an analytical technique that measures the mass-to-charge ratio of charged particles.It is used for determining masses of particles, for determining the elemental composition of a sample or molecule, and for elucidating the chemical structures of molecules, such as peptides and...

. Examples include the development of multivariate models relating 1) multi-wavelength spectral response to analyte concentration, 2) molecular descriptors to biological activity, 3) multivariate process conditions/states to final product attributes. The process requires a calibration or training data set, which includes reference values for the properties of interest for prediction, and the measured attributes believed to correspond to these properties. For case 1), for example, one can assemble data from a number of samples, including concentrations for an analyte of interest for each sample (the reference) and the corresponding infrared spectrum of that sample. Multivariate calibration techniques such as partial-least squares regression, or principal component regression (and near countless other methods) are then used to construct a mathematical model that relates the multivariate response (spectrum) to the concentration of the analyte of interest, and such a model can be used to efficiently predict the concentrations of new samples.

Techniques in multivariate calibration are often broadly categorized as classical or inverse methods. The principal difference between these approaches is that in classical calibration the models are solved such that they are optimal in describing the measured analytical responses (e.g., spectra) and can therefore be considered optimal descriptors, whereas in inverse methods the models are solved to be optimal in predicting the properties of interest (e.g., concentrations, optimal predictors). Inverse methods usually require less physical knowledge of the chemical system, and at least in theory provide superior predictions in the mean-squared error sense, and hence inverse approaches tend to be more frequently applied in contemporary multivariate calibration.

The main advantages of the use of multivariate calibration techniques is that fast, cheap, or non-destructive analytical measurements (such as optical spectroscopy) can be used to estimate sample properties which would otherwise require time-consuming, expensive or destructive testing (such as HPLC). Equally important is that multivariate calibration allows for accurate quantitative analysis in the presence of heavy interference by other analytes. The selectivity of the analytical method is provided as much by the mathematical calibration, as the analytical measurement modalities. For example near-infrared spectra, which are extremely broad and non-selective compared to other analytical techniques (such as infrared or Raman spectra), can often be used successfully in conjunction with carefully developed multivariate calibration methods to predict concentrations of analytes in very complex matrices.

Classification, pattern recognition, clustering

Supervised multivariate classification techniques are closely related to multivariate calibration techniques in that a calibration or training set is used to develop a mathematical model capable of classifying future samples. The techniques employed in chemometrics are similar to those used in other fields – multivariate discriminant analysis, logistic regression, neural networks, regression/classification trees. The use of rank reduction techniques in conjunction with these conventional classification methods is routine in chemometrics, for example discriminant analysis on principal components or partial least squares
Partial least squares regression
Partial least squares regression is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a linear regression model by projecting the predicted variables and the...

 scores.

Unsupervised classification (also termed cluster analysis) is also commonly used to discover patterns in complex data sets, and again many of the core techniques used in chemometrics are common to other fields such as machine learning and statistical learning.

Multivariate curve resolution

In chemometric parlance, multivariate curve resolution seeks to deconstruct data sets with limited or absent reference information and system knowledge. Some of the earliest work on these techniques was done by Lawton and Sylvestre in the early 1970s. These approaches are also called self-modeling mixture analysis, blind source/signal separation
Blind signal separation
Blind signal separation, also known as blind source separation, is the separation of a set of signals from a set of mixed signals, without the aid of information about the source signals or the mixing process....

, and spectral unmixing. For example, from a data set comprising fluorescence spectra from a series of samples each containing multiple fluorophores, multivariate curve resolution methods can be used to extract the fluorescence spectra of the individual fluorophores, along with their relative concentrations in each of the samples, essentially unmixing the total fluorescence spectrum into the contributions from the individual components. The problem is usually ill-determined due to rotational ambiguity (many possible solutions can equivalently represent the measured data), so the application of additional constraints is common, such as non-negatively, unmodality, or known interrelationships between the individual components (e.g., kinetic or mass-balance constraints).

Multivariate curve resolution is commonly applied in the study of chemical reactions and processes, and increasingly in chemical hyperspectral imaging. Refer to Tauler and de Juan for recent comprehensive reviews.

Other techniques

Experimental design
Design of experiments
In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...

remains a core area of study in chemometrics and several monographs are specifically devoted to experimental design in chemical applications. Sound principles of experimental design have been widely adopted within the chemometrics community, although many complex experiments are purely observational, and there can be little control over the properties and interrelationships of the samples and sample properties.

Signal processing
Signal processing
Signal processing is an area of systems engineering, electrical engineering and applied mathematics that deals with operations on or analysis of signals, in either discrete or continuous time...

is also a critical component of almost all chemometric applications, particularly the use of signal pretreatments to condition data prior to calibration or classification. The techniques employed commonly in chemometrics are often closely related to those used in related fields.

Performance characterization, and figures of merit Like most arenas in the physical sciences, chemometrics is quantitatively oriented, so considerable emphasis is placed on performance characterization, model selection, verification & validation, and figures of merit
Figure of merit
A figure of merit is a quantity used to characterize the performance of a device, system or method, relative to its alternatives. In engineering, figures of merit are often defined for particular materials or devices in order to determine their relative utility for an application...

. The performance of quantitative models is usually specified by root mean squared error in predicting the attribute of interest, and the performance of classifiers as a true-positive rate/false-positive rate pairs (or a full ROC curve). A recent report by Olivieri et al. provides a comprehensive overview of figures of merit and uncertainty estimation in multivariate calibration, including multivariate definitions of selectivity, sensitivity, SNR and prediction interval estimation. Chemometric model selection usually involves the use of tools such as resampling
Resampling (statistics)
In statistics, resampling is any of a variety of methods for doing one of the following:# Estimating the precision of sample statistics by using subsets of available data or drawing randomly with replacement from a set of data points # Exchanging labels on data points when performing significance...

 (including bootstrap, permutation, cross-validation).

Multivariate statistical process control
Statistical process control
Statistical process control is the application of statistical methods to the monitoring and control of a process to ensure that it operates at its full potential to produce conforming product. Under SPC, a process behaves predictably to produce as much conforming product as possible with the least...

 (MSPC)
, modeling and optimization accounts for a substantial amount of historical chemometric development. Spectroscopy has been used successfully for online monitoring of manufacturing processes for 30–40 years, and this process data is highly amenable to chemometric modeling. Specifically in terms of MSPC, multiway modeling of batch and continuous processes is increasingly common in industry and remains an active area of research in chemometrics and chemical engineering. Process analytical chemistry as it was originally termed, or the newer term process analytical technology
Process Analytical Technology
Process analytical technology has been defined by the United States Food and Drug Administration as a mechanism to design, analyze, and control pharmaceutical manufacturing processes through the measurement of Critical Process Parameters which affect Critical Quality Attributes .The concept...

 continues to draw heavily on chemometric methods and MSPC.

Multiway methods are heavily used in chemometric applications. These are higher-order extensions of more widely used methods. For example, while the analysis of a table (matrix, or second-order arry) of data is routine in several fields, multiway methods are applied to data sets that involve 3rd, 4th, or higher-orders. Data of this type is very common in chemistry, for example a liquid-chromatography / mass spectrometry (LC-MS) system generates a large matrix of data (elution time versus m/z) for each sample analyzed. The data across multiple samples thus comprises a data cube. Batch process modeling involves data sets that have time vs. process variables vs. batch number. The multiway mathematical methods applied to these sorts of problems include PARAFAC, trilinear decomposition, and multiway PLS and PCA.

Software



A number of chemometric practitioners also use statistical packages such as SAS-JMP, and Minitab for some facets of analysis. Programming directly in Matlab
MATLAB
MATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...

, R
R (programming language)
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians for developing statistical software, and R is widely used for statistical software development and data analysis....

, and other languages is also common, and a range of free toolboxes are available on the internet.

Further reading

  • Chemometrics and intelligent laboratory systems
    Chemometrics and intelligent laboratory systems
    Chemometrics and Intelligent Laboratory Systems is a peer-reviewed scientific journal sponsored by the Chemometrics Society and published since 1986 by Elsevier. The current editor-in-chief is R...

    , an international journal sponsored by the chemometrics society published since 1987 by Elsevier
  • H. Martens, T. Naes, Multivariate calibration, Wiley 1989
  • K.R. Beebe, R.J. Pell, M.B. Seasholtz, Chemometrics: a practical guide, Wiley 1998
  • D.L. Massart, B.G.M. Vandeginste, S.M. Deming, Y. Michotte, L. Kaufman, Chemometrics: a textbook, Elsevier 1988
  • B.G.M. Vandeginste, D.L. Massart, L.M.C. Buydens, S. De Jong, P.J. Lewi, J. Smeyers-Verbeke, Hand book of Chemometrics and Qualimetrics: Part A & Part B, Elsevier 1998
  • S.D. Brown, R. Tauler, B. Walczak (eds), Comprehensive Chemometrics: chemical and biochemical data analysis (4 volume set), Elsevier, 2009
  • R.G. Brereton, Applied chemometrics for scientists, Wiley 2007
  • M. Otto, Chemometrics: statistics and computer application in analytical chemistry, 2nd Edition, Wiley-VCH 2007
  • R. Kramer, Chemometric techniques for quantitative analysis, CRC Press, 1998
  • P.J. Gemperline (ed), Practical guide to chemometrics, 2nd Edition, CRC Press 2006
  • H. Mark, J. Workman, Chemometrics in spectroscopy, Academic Press-Elsevier, 2007
  • M. Maeder, Y.-M. Neuhold, Practical Data Analysis in Chemistry, Elsevier 2007

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK