Star schema
Encyclopedia
In computing
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, the star schema (also called star-join schema, data cube, or multi-dimensional schema) is the simplest style of data warehouse
Data warehouse
In computing, a data warehouse is a database used for reporting and analysis. The data stored in the warehouse is uploaded from the operational systems. The data may pass through an operational data store for additional operations before it is used in the DW for reporting.A data warehouse...

 schema
Logical schema
A Logical Schema is a data model of a specific problem domain expressed in terms of a particular data management technology. Without being specific to a particular database management product, it is in terms of either relational tables and columns, object-oriented classes, or XML tags...

. The star schema consists of one or more fact table
Fact table
In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is often located at the centre of a star schema or a snowflake schema, surrounded by dimension tables....

s referencing any number of dimension tables
Dimension (data warehouse)
In a data warehouse, a dimension is a data element that categorizes each item in a data set into non-overlapping regions. A data warehouse dimension provides the means to "slice and dice" data in a data warehouse. Dimensions provide structured labeling information to otherwise unordered numeric...

. The star schema is an important special case of the snowflake schema
Snowflake schema
In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.The snowflake schema...

, and is more effective for handling simpler queries.

Model

A star schema classifies the attributes of an event into facts (measured numeric/time data), and descriptive dimension attributes (product id, customer name, sale date) that give the facts a context. A fact record is the nexus between the specific dimension values and the recorded facts. The Facts are grouped together by grain (level of detail) and stored in the fact table. Dimension attributes are organized into affinity groups and stored in a minimal number of dimension tables.

A weather star schema that records weather data may have facts of temperature, barometric pressure, wind speed, precipitation, cloud cover, etc and dimensions of location, date/time, reporter, etc.

Star schemas are designed to optimize user ease-of-use and retrieval performance by minimizing the number of tables to join to materialize a transaction.

A star schema is called such as it resembles a constellation of stars, generally several bright stars (facts) surrounded by dimmer ones (dimensions).
  • The fact table holds the metric values recorded for a specific event. Because of the desire to hold atomic level data, there generally are a very large number of records (billions). Special care is taken to minimize the number and size of attributes in order to constrain the overall table size and maintain performance. Fact tables generally come in 3 flavors - transaction (facts about a specific event eg Sale), snapshot (facts recorded at a point in time eg Account details at month end), and accumulating snapshot tables (eg month-to-date sales for a product).

  • Dimension tables usually have few records compared to fact tables, but may have a very large number of attributes that describe the fact data.

Often there can be dozens to hundreds of dimension attributes describing the various facets of a fact. Dimension attributes are organized into tables of loosely related attributes that share a known or unknown affinity. Attributes of color, style, size, texture can describe a product and would be included in a product dimension table. Dimension tables include attributes that typically would be normalized into separate tables (Snowflake schema). For example, in the US a location can be identified by a zipcode that exists within a neighborhood, city, state, region. All of these attributes would be included in a location dimension table.

On an Entity-Relationship(ER) diagram, fact tables often appear small because of the few distinct columns, while dimension tables appear large because of the large number of columns. The diagram ignores the reality that approx 75% or more of the storage is used by the fact table.

Dimension tables are assigned a surrogate primary key of a simple integer that is assigned to the combination of low level attributes that form the natural key. Fact tables should also have a single surrogate primary key to allow for situations where there may be two or more facts having the exact same set of dimension keys.

Star schema that have more than a dozen or so dimensions are called centipede schema{Kimball p 393}. Having dimensions of only a few attributes, while simpler to maintain, result in queries with 20, 30, 40 table joins and defeat the ease-of-use performance goals of star schema
.

Benefits

The primary benefit of a star schema is its simplicity for users to write, and databases to process: queries are written with simple inner joins between the facts and a small number of dimensions. Star joins are simpler than possible in snowflake schema
Snowflake schema
In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.The snowflake schema...

. Where conditions need only to filter on the attributes desired, and aggregations are fast.

The star schema is a way to implement multidimensional database (MDDB) functionality using a mainstream relational database
Relational database
A relational database is a database that conforms to relational model theory. The software used in a relational database is called a relational database management system . Colloquial use of the term "relational database" may refer to the RDBMS software, or the relational database itself...

: given most organizations' commitment to relational databases, a specialized multidimensional DBMS is likely to be both expensive and inconvenient.

Example

Consider a database of sales, perhaps from a store chain, classified by date, store and product. The image of the schema to the right is a star schema version of the sample schema provided in the snowflake schema
Snowflake schema
In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.The snowflake schema...

article.

Fact_Sales is the fact table and there are three dimension tables Dim_Date, Dim_Store and Dim_Product.

Each dimension table has a primary key on its Id column, relating to one of the columns (viewed as rows in the example schema) of the Fact_Sales table's three-column (compound) primary key (Date_Id, Store_Id, Product_Id). The non-primary key Units_Sold column of the fact table in this example represents a measure or metric that can be used in calculations and analysis. The non-primary key columns of the dimension tables represent additional attributes of the dimensions (such as the Year of the Dim_Date dimension).

For example, the following query answers how many TV sets have been sold, for each brand and country, in 1997:

SELECT
P.Brand,
S.Country,
SUM (F.Units_Sold)
FROM Fact_Sales F
INNER JOIN Dim_Date D ON F.Date_Id = D.Id
INNER JOIN Dim_Store S ON F.Store_Id = S.Id
INNER JOIN Dim_Product P ON F.Product_Id = P.Id
WHERE
D.Year = 1997
AND P.Product_Category = 'tv'
GROUP BY
P.Brand,
S.Country

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK