Data Warehouse  sandbox 

Introduction

  • Typically, read only data for business intelligence queries.
  • Companies have multiple different OLTP databases. Data is Extracted from these, Transformed, and Loaded into a data warehouse (OLAP).
  • Why not use OLTP directly for analytics queries?
    • Access controls on business critical data
    • Analytics queries often need aggregation, leads to performance issues when reading large amounts of data.
  • Suitable for column oriented storage. Each record has hundreds of columns, but most queries only read 3 or 4 columns at most.
  • Use bitmap indexes for queries on data (bitmap encoded).
  • Vectorized processing

Schemas

Both schemas typically depend on Fact (wide, 100s columns) and Dimension (less wide) tables. Fact tables are records of transactions (at a retailer, for ex), and dimension tables contain further metadata about fields in a record like product (brand, supplier, category), customer (name, address, phone), price (sale price, purchase price, promotions) etc.

Star

Snowflake

Examples

Refer