AI Agent

Data Tips #10 – Describing Dimensional data and Facts data

In todays article we will discuss two types of data that are central to analytics.

In next weeks article we will discuss how to handle these types of data and why they should be handled in different manners. Stay tuned!

Master data (also referred to as Dimensional data) is data that describes a certain information object and gives context to a business transaction. Examples of dimensional data can be store, customer, item, warehouse, etc. For analytical purposes dimensional data is seldom useful on its own. It exists to give more descriptions about business events.

In a SQL statement the dimensional data typically appears in the Where or Group By clauses.

Transactional data (also referred to as facts data) is data that describes a business event. Examples of facts can be a customer sales, stock movement, or any other event that can be counted and/or summarised.

In SQL statement the facts data typically appears in the sum(column) or count (columns) where the calculations are made.

Facts and Dimensions are used together to create a complete picture of the business events together with the descriptive data that gives context to that event.

Example of a simple data model below:

Fact Table: Sales

  • Columns: sales_id, date_id, store_id, item_id, customer_id, promotion_id, quantity_sold, sales_amount

Dimension Tables

  1. Date DimensionColumns: date_id, date, day, month, year, quarter
  2. Store DimensionColumns: store_id, store_name, store_location
  3. Item DimensionColumns: item_id, item_name, item_category, item_price
  4. Customer DimensionColumns: customer_id, customer_name, customer_segment
  5. Promotion DimensionColumns: promotion_id, promotion_name, promotion_type, start_date, end_date

This model is designed to support analyses on sales data by various dimensions, such as by date, store, item, customer, and promotion. The fact table holds the transactional data (quantities sold and sales amounts) linked to the dimension tables through their respective IDs.

A SQL statement that uses these facts and dimensions could look something like:

SELECT 
    d.month,  --dimension
    s.store_name, --dimension
    i.item_category, --dimension
    c.customer_segment, --dimension
    p.promotion_type, --dimension
    SUM(f.quantity_sold) AS total_quantity_sold, --Facts
    SUM(f.sales_amount) AS total_sales_amount --Facts
FROM 
    Sales f
JOIN DateDimension d ON f.date_id = d.date_id
JOIN StoreDimension s ON f.store_id = s.store_id
JOIN ItemDimension i ON f.item_id = i.item_id
JOIN CustomerDimension c ON f.customer_id = c.customer_id
JOIN PromotionDimension p ON f.promotion_id = p.promotion_id
GROUP BY --all of the dimension columns
    d.month,
    s.store_name,
    i.item_category,
    c.customer_segment,
    p.promotion_type

Share on social media: