In todays article we will discuss two types of data that are central to analytics.
In next weeks article we will discuss how to handle these types of data and why they should be handled in different manners. Stay tuned!
Master data (also referred to as Dimensional data) is data that describes a certain information object and gives context to a business transaction. Examples of dimensional data can be store, customer, item, warehouse, etc. For analytical purposes dimensional data is seldom useful on its own. It exists to give more descriptions about business events.
In a SQL statement the dimensional data typically appears in the Where or Group By clauses.
Transactional data (also referred to as facts data) is data that describes a business event. Examples of facts can be a customer sales, stock movement, or any other event that can be counted and/or summarised.
In SQL statement the facts data typically appears in the sum(column) or count (columns) where the calculations are made.
Facts and Dimensions are used together to create a complete picture of the business events together with the descriptive data that gives context to that event.
Example of a simple data model below:
Fact Table: Sales
- Columns: sales_id, date_id, store_id, item_id, customer_id, promotion_id, quantity_sold, sales_amount
Dimension Tables
- Date DimensionColumns: date_id, date, day, month, year, quarter
- Store DimensionColumns: store_id, store_name, store_location
- Item DimensionColumns: item_id, item_name, item_category, item_price
- Customer DimensionColumns: customer_id, customer_name, customer_segment
- Promotion DimensionColumns: promotion_id, promotion_name, promotion_type, start_date, end_date
This model is designed to support analyses on sales data by various dimensions, such as by date, store, item, customer, and promotion. The fact table holds the transactional data (quantities sold and sales amounts) linked to the dimension tables through their respective IDs.
A SQL statement that uses these facts and dimensions could look something like:
SELECT d.month, --dimension s.store_name, --dimension i.item_category, --dimension c.customer_segment, --dimension p.promotion_type, --dimension SUM(f.quantity_sold) AS total_quantity_sold, --Facts SUM(f.sales_amount) AS total_sales_amount --Facts FROM Sales f JOIN DateDimension d ON f.date_id = d.date_id JOIN StoreDimension s ON f.store_id = s.store_id JOIN ItemDimension i ON f.item_id = i.item_id JOIN CustomerDimension c ON f.customer_id = c.customer_id JOIN PromotionDimension p ON f.promotion_id = p.promotion_id GROUP BY --all of the dimension columns d.month, s.store_name, i.item_category, c.customer_segment, p.promotion_type