Data Warehouse sandbox
Introduction
- Typically, read only data for business intelligence queries.
- Companies have multiple different OLTP databases. Data is Extracted from these, Transformed, and Loaded into a data warehouse (OLAP).
- Why not use OLTP directly for analytics queries?
- Access controls on business critical data
- Analytics queries often need aggregation, leads to performance issues when reading large amounts of data.
- Suitable for column oriented storage. Each record has hundreds of columns, but most queries only read 3 or 4 columns at most.
- Use bitmap indexes for queries on data (bitmap encoded).
- Vectorized processing
Schemas
Both schemas typically depend on Fact (wide, 100s columns) and Dimension (less wide) tables. Fact tables are records of transactions (at a retailer, for ex), and dimension tables contain further metadata about fields in a record like product (brand, supplier, category), customer (name, address, phone), price (sale price, purchase price, promotions) etc.