written 8.0 years ago by |
Star Schema:
A star schema is generally considered to be the most efficient design for two reasons.
A design with de-normalized tables encounters fewer join operations.
most optimizers are smart enough to recognize a star schema and generate access plans that use efficient "star join" operations.
It has been established that a "standard templateā data warehouse query directly maps to a star schema.
The goal of a star schema design is to simplify the physical data model so that RDBMS optimizers can exploit advanced indexing and join techniques in a straightforward manner, as shown in Fig
Each dimension table has a primary key on its Id column, relating to one of the columns (viewed as rows in the example schema) of the Fact_Sales table's three-column (compound) primary key (Date_Id, Store_Id, Product_Id).
The non-primary key Units_Sold column of the fact table in this example represents a measure or metric that can be used in calculations and analysis. The non-primary key columns of the dimension tables represent additional attributes of the dimensions (such as the Year of the Dim_Date dimension).
Fact tables:
Fact tables record measurements or metrics for a specific event. Fact tables generally consist of numeric values, and foreign keys to dimensional data where descriptive information is kept.
Fact tables are designed to a low level of uniform detail (referred to as "granularity" or "grain"), meaning facts can record events at a very atomic level.
This can result in the accumulation of a large number of records in a fact table over time. Fact tables are defined as one of three types:
Transaction fact tables record facts about a specific event (e.g., sales events)
- Snapshot fact tables record facts at a given point in time (e.g., account details at month end)
Accumulating snapshot tables record aggregate facts at a given point in time (e.g., total month-to-date sales for a product)
Fact tables are generally assigned a surrogate key to ensure each row can be uniquely identified. This key is a simple primary key.
A Fact table is calculated based on "lowest level granualarity". The lowest level granualarity can be determined from the dimension table (lowest level in each hierarchy).
eg:- 5 years of historical data for 40,0000 products of 300 stores can be calculated as
(5*365)*40,000*300