In
computingComputing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...
,
online analytical processing, or
OLAP (icon), is an approach to swiftly answer multi-dimensional analytical (MDA) queries. OLAP is part of the broader category of
business intelligenceBusiness intelligence mainly refers to computer-based techniques used in identifying, extracting, and analyzing business data, such as sales revenue by products and/or departments, or by associated costs and incomes....
, which also encompasses relational reporting and
data miningData mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
. Typical applications of OLAP include business reporting for sales,
marketingMarketing is the process used to determine what products or services may be of interest to customers, and the strategy to use in sales, communications and business development. It generates the strategy that underlies sales techniques, business communication, and business developments...
, management reporting,
business process managementBusiness process management is a holistic management approach focused on aligning all aspects of an organization with the wants and needs of clients. It promotes business effectiveness and efficiency while striving for innovation, flexibility, and integration with technology. BPM attempts to...
(BPM),
budgetA budget is a financial plan and a list of all planned expenses and revenues. It is a plan for saving, borrowing and spending. A budget is an important concept in microeconomics, which uses a budget line to illustrate the trade-offs between two or more goods...
ing and
forecastForecasting is the process of making statements about events whose actual outcomes have not yet been observed. A commonplace example might be estimation for some variable of interest at some specified future date. Prediction is a similar, but more general term...
ing, financial reporting and similar areas, with new applications coming up, such as
agricultureAgriculture is the cultivation of animals, plants, fungi and other life forms for food, fiber, and other products used to sustain life. Agriculture was the key implement in the rise of sedentary human civilization, whereby farming of domesticated species created food surpluses that nurtured the...
. The term
OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing).
OLAP tools enable users to interactively analyze multidimensional data from multiple perspectives. OLAP consists of three basic analytical operations: consolidation, drill-down, and slicing and dicing. Consolidation involves the aggregation of data that can be accumulated and computed in one or more dimensions. For example, all sales offices are rolled up to the sales department or sales division to anticipate sales trends. In contrast, the drill-down is a technique that allows users to navigate through the details. For instance, users can access to the sales by individual products that make up a region’s sales. Slicing and dicing is a feature whereby users can take out (slicing) a specific set of data of the cube and view (dicing) the slices from different viewpoints.
DatabaseA database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
s configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. They borrow aspects of
navigational databaseA navigational database is a type of database characterized by the fact that objects in it are found primarily by following references from other objects...
s, hierarchical databases and
relational databaseA relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
s.
The core of any OLAP system is an
OLAP cubeAn OLAP cube is a data structure that allows fast analysis of data. It can also be defined as the capability of manipulating and analyzing data from multiple perspectives...
(also called a 'multidimensional cube' or a
hypercube). It consists of numeric facts called
measures which are categorized by
dimensionsIn a data warehouse, a dimension is a data element that categorizes each item in a data set into non-overlapping regions. A data warehouse dimension provides the means to "slice and dice" data in a data warehouse. Dimensions provide structured labeling information to otherwise unordered numeric...
. The cube metadata is typically created from a
star schemaIn computing, the star schema is the simplest style of data warehouse schema. The star schema consists of one or more fact tables referencing any number of dimension tables...
or
snowflake schemaIn computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.The snowflake schema...
of tables in a
relational databaseA relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...
. Measures are derived from the records in the
fact tableIn data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is often located at the centre of a star schema or a snowflake schema, surrounded by dimension tables....
and dimensions are derived from the
dimension tableIn data warehousing, a dimension table is one of the set of companion tables to a fact table.The fact table contains business facts or measures and foreign keys which refer to candidate keys in the dimension tables....
s.
Each
measure can be thought of as having a set of
labels, or meta-data associated with it. A
dimension is what describes these
labels; it provides information about the
measure.
A simple example would be a cube that contains a store's sales as a
measure, and Date/Time as a
dimension. Each Sale has a Date/Time
label that describes more about that sale.
Any number of
dimensions can be added to the structure such as Store, Cashier, or Customer by adding a foreign key column to the
fact tableIn data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is often located at the centre of a star schema or a snowflake schema, surrounded by dimension tables....
. This allows an analyst to view the
measures along any combination of the
dimensions.
For example:
Sales Fact Table
+-------------+----------+
| sale_amount | time_id |
+-------------+----------+ Time Dimension
| 2008.10| 1234 |---+ +---------+-------------------+
+-------------+----------+ | | time_id | timestamp |
| +---------+-------------------+
+---->| 1234 | 20080902 12:35:43 |
+---------+-------------------+
Multidimensional databases
Multidimensional structure is defined as “a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data”. The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. “Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions”. Even when data is manipulated it remains easy to access and continues to constitute a compact database format. The data still remains interrelated.
Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications (O’Brien & Marakas, 2009). Analytical databases use these databases because of their ability to deliver answers to complex business queries swiftly. Data can be viewed from different angles, which gives a broader perspective of a problem unlike other models.
Aggregations
It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time required for the same query on OLTP relational data. The most important mechanism in OLAP which allows it to achieve such performance is the use of
aggregations. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions. The number of possible aggregations is determined by every possible combination of dimension granularities.
The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data
.
Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the view selection problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, although some studies also minimize the update time. View selection is
NP-CompleteIn computational complexity theory, the complexity class NP-complete is a class of decision problems. A decision problem L is NP-complete if it is in the set of NP problems so that any given solution to the decision problem can be verified in polynomial time, and also in the set of NP-hard...
. Many approaches to the problem have been explored, including
greedy algorithmA greedy algorithm is any algorithm that follows the problem solving heuristic of making the locally optimal choice at each stagewith the hope of finding the global optimum....
s, randomized search,
genetic algorithmA genetic algorithm is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems...
s and A* search algorithm.
Multidimensional
MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Therefore it requires the pre-computation and storage of information in the cube - the operation known as processing.
Relational
ROLAP works directly with relational databases. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. Depends on a specialized schema design.This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.
Hybrid
There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a database will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data.
Comparison
Each type has certain benefits, although there is disagreement about the specifics of the benefits between providers.
- Some MOLAP implementations are prone to database explosion, a phenomenon causing vast amounts of storage space to be used by MOLAP databases when certain common conditions are met: high number of dimensions, pre-calculated results and sparse multidimensional data.
- MOLAP generally delivers better performance due to specialized indexing and storage optimizations. MOLAP also needs less storage space compared to ROLAP because the specialized storage typically includes compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
techniques.
- ROLAP is generally more scalable. However, large volume pre-processing is difficult to implement efficiently so it is frequently skipped. ROLAP query performance can therefore suffer tremendously.
- Since ROLAP relies more on the database to perform calculations, it has more limitations in the specialized functions it can use.
- HOLAP encompasses a range of solutions that attempt to mix the best of ROLAP and MOLAP. It can generally pre-process swiftly, scale well, and offer good function support.
Other types
The following acronyms are also sometimes used, although they are not as widespread as the ones above:
- WOLAP - Web-based OLAP
- DOLAP - Desktop
A desktop computer is a personal computer in a form intended for regular use at a single location, as opposed to a mobile laptop or portable computer. Early desktop computers are designed to lay flat on the desk, while modern towers stand upright...
OLAP
- RTOLAP
-RTOLAP - Real Time OLAP:Whilst many OLAP Servers such as Microsoft Analysis Services store pre-calculating consolidations and calculated elements to achieve rapid response times...
- Real-Time OLAP
- BROLAP - Bi-directional Rotational OLAP
APIs and query languages
Unlike relational databases, which had SQL as the standard query language, and widespread
APIAn application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
s such as ODBC, JDBC and OLEDB, there was no such unification in the OLAP world for a long time. The first real standard API was OLE DB for OLAP specification from
MicrosoftMicrosoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
which appeared in 1997 and introduced the
MDXMultidimensional Expressions is a query language for OLAP databases, much like SQL is a query language for relational databases. It is also a calculation language, with syntax similar to spreadsheet formulas.-Background:...
query language. Several OLAP vendors - both server and client - adopted it. In 2001
MicrosoftMicrosoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
and Hyperion announced the
XML for AnalysisXML for Analysis is an industry standard for data access in analytical systems, such as OLAP and data mining. XMLA is based on other industry standards such as XML, SOAP and HTTP...
specification, which was endorsed by most of the OLAP vendors. Since this also used MDX as a query language, MDX became the de-facto standard.
Since September-2011
LINQLinq is a word-based card game from Endless Games, introduced at the American International Toy Fair in 2005.Game play requires at least four players, two of whom are dealt cards with the same word, while the others receive blanks. The goal is to gain points by correctly naming the players with...
can be used to query
SSASMicrosoft SQL Server Analysis Services is part of Microsoft SQL Server, a database management system. Microsoft has included a number of services in SQL Server related to business intelligence and data warehousing. These services include Integration Services and Analysis Services...
OLAP cubes from Microsoft .NET.
History
The first product that performed OLAP queries was
Express, which was released in 1970 (and acquired by
OracleOracle Corporation is an American multinational computer technology corporation that specializes in developing and marketing hardware systems and enterprise software products – particularly database management systems...
in 1995 from Information Resources). However, the term did not appear until 1993 when it was coined by
Edgar F. CoddEdgar Frank "Ted" Codd was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...
, who has been described as "the father of the relational database". Codd's paper resulted from a short consulting assignment which Codd undertook for former Arbor Software (later
Hyperion SolutionsHyperion Solutions Corporation was a business performance management software company, located in Santa Clara, California, USA, which was acquired by Oracle Corporation in 2007...
, and in 2007 acquired by Oracle), as a sort of marketing coup. The company had released its own OLAP product,
EssbaseEssbase is a multidimensional database management system that provides a multidimensional database platform upon which to build analytic applications. Essbase, whose name derives from "Extended Spread Sheet dataBASE", began as a product of Arbor Software, which merged with Hyperion Software in 1998...
, a year earlier. As a result Codd's "twelve laws of online analytical processing" were explicit in their reference to Essbase. There was some ensuing controversy and when Computerworld learned that Codd was paid by Arbor, it retracted the article.
OLAP market experienced strong growth in late 90s with dozens of commercial products going into market. In 1998,
MicrosoftMicrosoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
released its first OLAP Server -
Microsoft Analysis ServicesMicrosoft SQL Server Analysis Services is part of Microsoft SQL Server, a database management system. Microsoft has included a number of services in SQL Server related to business intelligence and data warehousing. These services include Integration Services and Analysis Services...
, which drove wide adoption of OLAP technology and moved it into mainstream.
Market structure
Below is a list of top OLAP vendors in 2006, with figures in millions of US Dollars.
| Vendor | Global Revenue |
| Microsoft Corporation |
1,806 |
| Hyperion Solutions Corporation |
1,077 |
| Cognos Cognos was an Ottawa, Ontario-based company making business intelligence and performance management software. Founded in 1969, at its peak Cognos employed almost 3,500 people and served more than 23,000 customers in over 135 countries.Originally Quasar Systems Limited, it adopted the Cognos... |
735 |
Business ObjectsSAP Business Objects is a French enterprise software company, specializing in business intelligence . Since 2007, it has been a part of SAP AG. The company claimed more than 46,000 customers worldwide in its final earnings release... |
416 |
| MicroStrategy MicroStrategy, Inc. , is a business intelligence software vendor. MicroStrategy's software enables leading organizations worldwide to analyze the vast amounts of data stored across their enterprises to make more strategic business decisions... |
416 |
SAP AGSAP AG is a German software corporation that makes enterprise software to manage business operations and customer relations. Headquartered in Walldorf, Baden-Württemberg, with regional offices around the world, SAP is the market leader in enterprise application software... |
330 |
| Cartesis SA |
210 |
| Applix Applix Inc. was a software company founded in 1983 based in Westborough, Massachusetts that published Applix TM1, a MOLAP database server, and related presentation tools, including Applix Web and Applix Executive Viewer. Together, Applix TM1, Applix Web and Applix Executive Viewer were the three... |
205 |
| Infor |
199 |
Oracle CorporationOracle Corporation is an American multinational computer technology corporation that specializes in developing and marketing hardware systems and enterprise software products – particularly database management systems... |
159 |
| Others |
152 |
| Total |
5,700 |
Microsoft was the only vendor that continuously exceeded the industrial average growth during 2000-2006. Since the above data was collected, Hyperion has been acquired by Oracle, Cartesis by Business Objects, Business Objects by SAP, Applix by Cognos, and Cognos by IBM.