Home      Discussion      Topics      Dictionary      Almanac
Signup       Login
OLAP

OLAP

Overview
Online analytical processing, or OLAP , is an approach to quickly answer multi-dimensional analytical queries. OLAP is part of the broader category of business intelligence
Business intelligence
Business intelligence refers to skills, technologies, applications and practices used to help a business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected information itself....

, which also encompasses relational reporting and data mining
Data mining
Data mining is the process of extracting patterns from data. As more data are gathered, with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transform these data into information...

. The typical applications of OLAP are in business reporting for sales, marketing
Marketing
Marketing is an integrated communications-based process through which individuals and communities are informed or persuaded that existing and newly-identified needs and wants may be satisfied by the products and services of others....

, management reporting, business process management
Business process management
Business process management is a management approach focused on aligning all aspects of an organization with the wants and needs of clients. It is a holistic management approach that promotes business effectiveness and efficiency while striving for innovation, flexibility, and integration with...

 (BPM), budget
Budget
A budget is generally a list of all planned expenses and revenues. It is a plan for saving and spending. A budget is an important concept in microeconomics, which uses a budget line to illustrate the trade-offs between two or more goods...

ing and forecasting, financial reporting
Financial Reporting
Financial reporting is the process of preparing and distributing financial information to users of such information in various forms. The most common format of formal financial reporting are financial statements...

 and similar areas. The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing).

Database
Database
A database is an integrated collection of logically related records or files consolidated into a common pool that provides data for one or more multiple uses....

s configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time.
Discussion
Ask a question about 'OLAP'
Start a new discussion about 'OLAP'
Answer questions from other users
Full Discussion Forum
 
Encyclopedia
Online analytical processing, or OLAP , is an approach to quickly answer multi-dimensional analytical queries. OLAP is part of the broader category of business intelligence
Business intelligence
Business intelligence refers to skills, technologies, applications and practices used to help a business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected information itself....

, which also encompasses relational reporting and data mining
Data mining
Data mining is the process of extracting patterns from data. As more data are gathered, with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transform these data into information...

. The typical applications of OLAP are in business reporting for sales, marketing
Marketing
Marketing is an integrated communications-based process through which individuals and communities are informed or persuaded that existing and newly-identified needs and wants may be satisfied by the products and services of others....

, management reporting, business process management
Business process management
Business process management is a management approach focused on aligning all aspects of an organization with the wants and needs of clients. It is a holistic management approach that promotes business effectiveness and efficiency while striving for innovation, flexibility, and integration with...

 (BPM), budget
Budget
A budget is generally a list of all planned expenses and revenues. It is a plan for saving and spending. A budget is an important concept in microeconomics, which uses a budget line to illustrate the trade-offs between two or more goods...

ing and forecasting, financial reporting
Financial Reporting
Financial reporting is the process of preparing and distributing financial information to users of such information in various forms. The most common format of formal financial reporting are financial statements...

 and similar areas. The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing).

Database
Database
A database is an integrated collection of logically related records or files consolidated into a common pool that provides data for one or more multiple uses....

s configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. They borrow aspects of navigational database
Navigational database
A navigational database is a type of database characterized by the fact that objects in it are found primarily by following references from other objects...

s and hierarchical databases that are faster than relational database
Relational database
A relational database matches data using common characteristics found within the data set. The resulting groups of data are organized and are much easier for people to understand....

s.

The output of an OLAP query is typically displayed in a matrix (or pivot
Pivot table
A pivot table is a data summarization tool found in data visualization programs such as spreadsheets . Among other functions, they can automatically sort, count, and total the data stored in one table or spreadsheet and create a second table displaying the summarized data. Pivot tables are also...

) format. The dimensions form the rows and columns of the matrix; the measures form the values.

Concept


At the core of any OLAP system is the concept of an OLAP cube
OLAP cube
An OLAP cube is a data structure that allows fast analysis of data. It can also be defined as the capability of manipulating and analyzing data from multiple perspectives. The arrangement of data into cubes overcomes a limitation of relational databases...

 (also called a multidimensional cube or a hypercube). It consists of numeric facts called measures which are categorized by dimensions
Dimension (data warehouse)
In a data warehouse, a dimension is a data element that categorizes each item in a data set into non-overlapping regions. A data warehouse dimension provides the means to "slice and dice" data in a data warehouse. Dimensions provide structured labeling information to otherwise unordered numeric...

. The cube metadata is typically created from a star schema
Star schema
The star schema is the simplest style of data warehouse schema. The star schema consists of a few fact tables referencing any number of dimension tables...

 or snowflake schema
Snowflake schema
A snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. Closely related to the star schema, the snowflake schema is represented by centralized fact tables which are connected to multiple dimensions...

 of tables in a relational database
Relational database
A relational database matches data using common characteristics found within the data set. The resulting groups of data are organized and are much easier for people to understand....

. Measures are derived from the records in the fact table
Fact table
In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is often located at the centre of a star schema, surrounded by dimension tables....

 and dimensions are derived from the dimension table
Dimension table
In data warehousing, a dimension table is one of the set of companion tables to a fact table.The fact table contains business facts or measures and foreign keys which refer to candidate keys in the dimension tables....

s.

Each measure can be thought of as having a set of labels, or meta-data associated with it. A dimension is what describes these labels; it provides information about the measure.

A simple example would be a cube that contains a store's sales as a measure, and Date/Time as a dimension. Each Sale has a Date/Time label that describes more about that sale.

Any number of dimensions can be added to the structure such as Store, Cashier, or Customer by adding a column to the fact table
Fact table
In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is often located at the centre of a star schema, surrounded by dimension tables....

. This allows an analyst to view the measures along any combination of the dimensions.

For Example:
Sales Fact Table
+-----------------------+
| sale_amount | time_id |
+-----------------------+ Time Dimension
| 2008.08| 1234|---+ +----------------------------+
+-----------------------+ | | time_id | timestamp |
| +----------------------------+
+---->| 1234 | 20080902 12:35:43|
+----------------------------+

Multidimensional databases


Multidimensional structure is defined as “a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data” (O'Brien & Marakas, 2009, pg 177). The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. “Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions” (pg. 178). Even when data is manipulated it is still easy to access as well as be a compact type of database. The data still remains interrelated.
Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications (O’Brien & Marakas, 2009). Analytical databases use these databases because of their ability to deliver answers quickly to complex business queries. Data can be seen from different ways, which gives a broader picture of a problem unlike other models (Williams, Garza, Tucker & Marcus, 1994).

Aggregations


It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time for the same query on OLTP relational data. The most important mechanism in OLAP which allows it to achieve such performance is the use of aggregations. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions. The number of possible aggregations is determined by every possible combination of dimension granularities.

The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data
.

Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the view selection problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, although some studies also minimize the update time. View selection is NP-Complete
NP-complete
In computational complexity theory, the complexity class NP-complete , is a class of problems having two properties...

. Many approaches to the problem have been explored, including greedy algorithm
Greedy algorithm
thumb|280px|right|The greedy algorithm determines the minimum number of US coins to give while [[Change-making problem|making change]]. These are the steps a human would take to emulate a greedy algorithm. The coin of the highest value, less than the remaining change owed, is the local optimum...

s, randomized search, genetic algorithm
Genetic algorithm
A genetic algorithm is a search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics...

s and A* search algorithm
A* search algorithm
In computer science, A* is a best-first graph search algorithm that finds the least-cost path from a given initial node to one goal node ....

.

A very effective way to support aggregation and other common OLAP operations is the use of bitmap index
Bitmap Index
A bitmap index is a special kind of database index that uses bitmaps.Bitmap indexes have traditionally been considered to work well for data such as gender, which has a small number of distinct values, e.g., male and female, but many occurrences of those values. This would happen if, for example,...

es.

Multidimensional



MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Therefore it requires the pre-computation and storage of information in the cube - the operation known as processing.

Relational



ROLAP works directly with relational databases. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. Depends on a specialized schema design.

Hybrid



There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a database will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data.

Comparison


Each type has certain benefits, although there is disagreement about the specifics of the benefits between providers.
  • Some MOLAP implementations are prone to database explosion. Database explosion is a phenomenon causing vast amounts of storage space to be used by MOLAP databases when certain common conditions are met: high number of dimensions, pre-calculated results and sparse multidimensional data. The typical mitigation technique for database explosion is not to materialize all the possible aggregation, but only the optimal subset of aggregations based on the desired performance vs. storage trade off.

  • MOLAP generally delivers better performance due to specialized indexing and storage optimizations. MOLAP also needs less storage space compared to ROLAP because the specialized storage typically includes compression
    Data compression
    In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits than an unencoded representation would use, through use of specific encoding schemes.As with any communication, compressed data communication only works when both...

     techniques.

  • ROLAP is generally more scalable. However, large volume pre-processing is difficult to implement efficiently so it is frequently skipped. ROLAP query performance can therefore suffer tremendously.

  • Since ROLAP relies more on the database to perform calculations, it has more limitations in the specialized functions it can use.

  • HOLAP encompasses a range of solutions that attempt to mix the best of ROLAP and MOLAP. It can generally pre-process quickly, scale well, and offer good function support.

Other types


The following acronyms are also sometimes used, although they are not as widespread as the ones above:
  • WOLAP - Web-based OLAP
  • DOLAP - Desktop
    Desktop
    Desktop refers to the surface of a desk. The term has been adopted as an adjective to distinguish office appliances which can be fitted on top of a desk from larger equipment covering its own area on the floor...

    OLAP
  • RTOLAP
    Rtolap
    -RTOLAP - Real Time OLAP:Whilst many OLAP Servers such as Microsoft Analysis Services store pre-calculating consolidations and calculated elements to achieve rapid response times...

    - Real-Time OLAP

APIs and query languages


Unlike relational databases, which had SQL as the standard query language, and wide-spread APIs such as ODBC, JDBC and OLEDB, there was no such unification in the OLAP world for a long time. The first real standard API was OLE DB for OLAP specification from Microsoft
Microsoft
Microsoft Corporation is a multinational computer technology corporation that develops, manufactures, licenses, and supports a wide range of software products for computing devices...

 which appeared in 1997 and introduced the MDX
Multidimensional Expressions
Multidimensional Expressions is a query language for OLAP databases, much like SQL is a query language for relational databases. It is also a calculation language, with syntax similar to spreadsheet formulas.-Background:...

 query language. Several OLAP vendors - both server and client - adopted it. In 2001 Microsoft
Microsoft
Microsoft Corporation is a multinational computer technology corporation that develops, manufactures, licenses, and supports a wide range of software products for computing devices...

 and Hyperion announced the XML for Analysis
XML for Analysis
XML for Analysis is the industry standard for data access in analytical systems, such as OLAP and Data Mining. XMLA is based on other industry standards such as XML, SOAP and HTTP...

 specification, which was endorsed by most of the OLAP vendors. Since this also used MDX as a query language, MDX became the de-facto standard.

History


The first product that performed OLAP queries was Express, which was released in 1970 (and acquired by Oracle
Oracle Corporation
Oracle Corporation specializes in developing and marketing enterprise software products — particularly database management systems. Through organic growth and a number of high-profile acquisitions, Oracle enlarged its share of the software market...

 in 1995 from Information Resources). However, the term did not appear until 1993 when it was coined by Edgar F. Codd
Edgar F. Codd
Edgar Frank "Ted" Codd was a British computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases...

, who has been described as "the father of the relational database". Codd's paper resulted from a short consulting assignment which Codd undertook for former Arbor Software (later Hyperion Solutions
Hyperion Solutions
Hyperion Solutions Corporation was a business performance management software company, located in Santa Clara, California, USA, which was acquired by Oracle Corporation in 2007. Many of its products were targeted at the Business Intelligence and Business performance management market, and are...

, and in 2007 acquired by Oracle), as a sort of marketing coup. The company had released its own OLAP product, Essbase
Essbase
Essbase is a multidimensional database management system that provides a multidimensional database platform upon which to build analytic applications. Essbase, whose name derives from "Extended Spread Sheet dataBASE", began as a product of Arbor Software, which merged with Hyperion Software in 1998...

, a year earlier. As a result Codd's "twelve laws of online analytical processing" were explicit in their reference to Essbase. There was some ensuing controversy and when Computerworld learned that Codd was paid by Arbor, it retracted the article.
OLAP market experienced strong growth in late 90s with dozens of commercial products going into market. In 1998, Microsoft
Microsoft
Microsoft Corporation is a multinational computer technology corporation that develops, manufactures, licenses, and supports a wide range of software products for computing devices...

 released its first OLAP Server - Microsoft Analysis Services
Microsoft Analysis Services
Microsoft Analysis Services is part of Microsoft SQL Server, a database management system. Microsoft has included a number of services in SQL Server related to Business Intelligence and Data Warehousing. These services include Integration Services and Analysis Services...

, which drove wide adoption of OLAP technology and moved it into mainstream.

Market structure


Below is a list of top OLAP vendors in 2006, with figures in millions of United States Dollar
United States dollar
The United States dollar is the unit of currency of the United States. The U.S. dollar is normally abbreviated as the dollar sign, $, or as USD or US$ to distinguish it from other dollar-denominated currencies and from others that use the $ symbol. It is divided into 100 cents .The U.S...

s.
Vendor Global Revenue
Microsoft Corporation  1,806
Hyperion Solutions Corporation  1,077
Cognos
Cognos
Cognos was an Ottawa, Ontario based company making business intelligence and performance management software. Founded in 1969, at its peak Cognos employed almost 3,500 people and served more than 23,000 customers in over 135 countries. Originally known as Quasar it adopted the Cognos name in 1982...

 
735
Business Objects
Business Objects (company)
Business Objects is a French enterprise software company, specializing in business intelligence . Since 2007 is part of SAP AG. The company claims more than 42,000 customers worldwide. Their flagship product is BusinessObjects XI, with components that provide performance management, planning,...

 
416
MicroStrategy
MicroStrategy
MicroStrategy is a business intelligence , enterprise reporting, and OLAP software vendor. MicroStrategy's software allows reporting and analysis of data stored in a relational database, multidimensional database, or flat data file...

 
416
SAP AG
SAP AG
SAP AG is a multinational software development and consulting corporation, which provides enterprise software applications and support to businesses of all sizes globally...

 
330
Cartesis SA  210
Applix
Applix
Applix Inc. was a software company based in Westborough, Massachusetts that published Applix TM1, a MOLAP database server, and related presentation tools, including Applix Web and Applix Executive Viewer. Together, Applix TM1, Applix Web and Applix Executive Viewer were the three core components of...

 
205
Infor  199
Oracle Corporation
Oracle Corporation
Oracle Corporation specializes in developing and marketing enterprise software products — particularly database management systems. Through organic growth and a number of high-profile acquisitions, Oracle enlarged its share of the software market...

 
159
Others 152
Total 5,700


Microsoft was the only vendor that continuously exceeded the industrial average growth during 2000-2006. Since the above data was collected, Hyperion has been acquired by Oracle, Cartesis by Business Objects, Business Objects by SAP, Applix by Cognos, and Cognos by IBM.

See also



  • Business intelligence
    Business intelligence
    Business intelligence refers to skills, technologies, applications and practices used to help a business acquire a better understanding of its commercial context. Business intelligence may also refer to the collected information itself....

  • Data warehousing
  • Data mining
    Data mining
    Data mining is the process of extracting patterns from data. As more data are gathered, with the amount of data doubling every three years, data mining is becoming an increasingly important tool to transform these data into information...

  • Predictive analytics
    Predictive analytics
    Predictive analytics encompasses a variety of techniques from statistics, data mining and game theory that analyze current and historical facts to make predictions about future events....

  • Business analytics
    Business analytics
    Business analytics refers to the skills, technologies, applications and practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning...

  • OLTP