Data Warehousing 101

April 17 2025

Imagine walking into a library with millions of books scattered across the floor - no labels, no shelves, no order. In one corner, you have fiction mixed with history; in another, science books are buried beneath poetry collections; in the middle, the rare first editions are lost underneath stacks of outdated magazines. Finding a single piece of information feels impossible - a fruitless scavenger hunt through chaos.

Now imagine that some library is meticulously organized. Every book is catalogued and sorted by genre, author, and subject. A detailed index helps you find exactly what you need in seconds. This is the transformation a data warehouse brings to an organization's data.

Like a well-ordered library that turns a chaotic collection of books into a knowledge hub, a data warehouse transforms raw, scattered data into a structured and accessible resource. This blog kickstarts a series that delves into data warehousing and how it is the cornerstone of data analytics.

Let's start with the basics

A data warehouse serves as the foundation where raw data from multiple sources is collected, cleaned, organized, and stored in a structured format. This makes it easier for analysts and business intelligence tools to access and analyze the data efficiently.

Without a well-organized data warehouse, data analysts would face inconsistent data, missing values, and inefficient querying — all of which would slow down and weaken the insights derived from the data. Just as a solid foundation supports a building, a data warehouse supports accurate and reliable data analysis, enabling businesses to make informed decisions.

Where data lives

Data can be stored in various formats, each designed to serve different business needs and analytical purposes.

Database

Structures data in rows and columns
Stores transactional data

Data Warehouse

Large storage for structured data from multiple sources
Difficult to make changes due to complex data relationships
Data is stored with a specific purpose
Primarily used for analysis with a known objective

Data Mart

A relational database focused on a single subject area
Smaller and draws data from fewer sources

Data Lake

A central store of large volumes of data from many sources
Stores unstructured data
Easier to make changes with fewer upstream and downstream effects
Data is stored without a defined purpose for future use

The Data Warehousing Life Cycle

The data warehousing life cycle begins with planning, where the overall design and structure of the warehouse are defined. Next is the gathering requirements phase, where business needs are analyzed to ensure the warehouse will meet organizational goals.

In the data modelling phase, the sources of data and how they will be integrated are determined, often involving the creation of a data pipeline. This end-to-end process collects, modifies, and delivers data. The implementation phase follows, where the warehouse is built, and the ETL design is established to support the data pipeline. Once the warehouse is operational, the focus shifts to BI application development, where the tools used for reporting and analysis are integrated with the warehouse.

The final stages include support and maintenance, where users are trained, and the system is regularly updated to accommodate evolving business needs. This includes modifying table designs, conducting tests, and deploying updates. Any significant changes to the warehouse will typically require revisiting these steps from the beginning, ensuring the system remains aligned with business objectives.

The Layers of the Data Warehouse

A data warehouse is built on multiple layers, each vital in managing and delivering data for analysis. From sourcing raw data to presenting insights through business intelligence tools, these layers ensure data is collected, processed, stored, and made accessible for informed decision-making.

Sourcing – This layer includes data from various sources such as transactional databases, log files and spreadsheets. It represents the initial point where data enters the warehouse.
Staging – In this layer, data is extracted, transformed, and cleaned before being moved to storage. The staging database temporarily holds the data during this process. For data to pass through the ETL process, it must be formatted into rows and columns to ensure compatibility with relational databases.
Storage – After staging, the processed data is stored in the data warehouse. It may then be organized into a data mart, or in some cases, data may flow from a data mart into the warehouse, depending on the architecture.
Presentation – This is the layer where users interact with the data through queries, BI tools, and data mining applications. It allows for data analysis and reporting, enabling business insights and decision-making.

Understanding how data moves through the transformation process to prepare it for analytical use provides a solid foundation for managing and analyzing data effectively. A well-structured data warehouse ensures that data is organized, accessible, and ready to support business insights. In the next blog, we’ll explore the key principles of data warehouse design and the differences between ETL and ELT processes, helping you choose the right approach for your data strategy.

Author:

Lorraine Ferrusi

View More Posts