What is DATA MODELLING?
Data Modelling or Data
Architecture or Dimension Modelling is nothing but creating a blueprint for how
data will be organized and stored in the data warehouse. This process involves
identifying the entities, attributes, and relationships between entities in the
data, and then designing tables and views to represent them. We can model the
data by two techniques i.e.,
1. Star Schema
2. Snowflake Schema
Before jumping into these, we
need to understand what is dimension table and what is fact table?
DIMENSION TABLE-
A table which stores descriptive
attribute, is non- measurable and categorical in nature is called a dimension
table.
FACT TABLE-
A fact table is a central table
that stores measurable, aggregate, quantitative or factual data about a
particular subject area.
Example-
Consider an E-Commerce
application, which will have attributes like
Products, Sales, Tax, Customer,
Discount.
For above scenario, Products
could behave as Dimension table.
And Sales could behave as Fact
table.
STAR SCHEMA-
In the previous diagram, the
fact table is in the center and the dimension table is in a relationship with
it which makes a star like structure hence, this is called the star schema.
SNOWFLAKE SCHEMA-
Snowflake schema is a variation
of the star schema that uses multiple layers of dimension tables. This can be
useful for complex data relationships.
Standard naming convention-
● A common prefix for fact
tables is "FACT_" or "FT_". This prefix helps
distinguish fact tables from
dimension tables.
Eg- fact_Sales or fact_Tax
● A common prefix for dimension
tables is "DIM_" or "D_". This prefix helps
distinguish dimension tables
from fact tables.
Eg- dim_Products, dim_Customers
or dim_Discounts
Types of Fact Table-
1. Transaction fact tables:
Theystore detailed information about individual business transactions or
events. They record every occurrence at the most granular level, providing a
comprehensive view of operational data.
2. Periodic snapshot tables:
Periodic snapshot tables provide a summarized view of metrics over regular time
intervals. They store aggregated data at a specific point in time, such as the
end of a day, week, or month.
3. Accumulating snapshot tables:
Accumulating snapshot tables track the stages of a business process or
workflow. They store data at specific
checkpoints within a process,
providing a detailed view of how the process unfolds over time.
Types of Dimension Table-
● Slowly Changing Dimension
(SCD) Tables: It store information that rarely changes over time. They
typically contain master data or lookup information, such as product codes,
customer IDs, or geographic codes. There are four
main types of SCD tables:
a. SCD Type 0: Static and does
not changes.
b. SCD Type 1: Overwrite the
previous field, doesn’t keep history.
c. SCD Type 2: Add a new history
table.
d. SCD Type 3: Add a new column
to keep history.
● Conformed Dimension Tables:
Conformed dimension tables are
standardized dimension tables
that are shared across multiple fact tables or subject areas.
● Degenerate Dimension Tables:
Degenerate dimension tables are dimension tables that are embedded within fact
tables.
● Junk Dimension Tables: Junk
dimension tables are used to group together disparate dimension attributes that
do not fit neatly into other dimension tables as they have low cardinality.
● Role Playing Dimension: They
are a type of dimension table that can
assume different meanings or
roles depending on the context of the analysis. They are often used to
represent entities that can play multiple roles in a business process.
● Static Dimension Table: Static
dimension tables are a type of table that
stores descriptive attribute
data that does not change over time.
● Shrunken Dimension Tables:
Shrunken dimension tables are dimension tables that contain a subset of the
attributes from a larger dimension table.