B.Sc. CSIT 7th Semester Data Warehousing and Data Mining

Introduction to Data Warehouse and Data Warehousing

Introduction to Data Warehouse and Data Warehousing

A data warehouse is a repository of information collected from multiple heterogeneous sources and placed in a single site in order to facilitate management decision making.

The process of constructing and using data warehouses is known as Data warehousing.

Data warehouses are constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing.

Benefits of Data Warehousing

Its main job is to power the reports, dashboards, and analytical tools that have become indispensable to businesses today. Some of the major benefits are listed below:

  • Better business analytics

With data warehousing decision makers have access to data from multiple sources and no longer have to make decisions based on incomplete information.

  •   Faster queries

Data warehouses are built specifically for fast data retrieval and analysis. With a data warehouse, we can very rapidly query large amounts of consolidated data with little to no support from IT.

  •   Improved data quality

Before being loaded into the data warehouse, data cleansing cases are created by the system and entered in a work list for further processing, ensuring data is transformed into a consistent format to support analysis – and decisions – based on high quality, accurate data.

  •  Historical insight

By storing rich historical data, a data warehouse lets decision-makers learn from past trends and challenges, make predictions and drive continuous business improvement

Data Warehouse Features / Nature of Data Warehouse

The key features of data warehouse are:

  •  Subject Oriented

A data warehouse is subject oriented as it provides information around a subject rather than the organisation’s ongoing operations.It also excludes data that are not useful in the decision support processThese subjects can be product, customers, suppliers, sales, revenue, etc.

  •    Integrated

A data warehouse is constructed by integrating data from heterogeneous sources such as relational databases, flat files, etc. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.

  •    Time Variant

The data collected in a data warehouse is identified with a particular time period. The data in a data warehouse provides information from the historical perspective (e.g., the past 5–10 years).

  •    Non-volatile

Non-volatile means the previous data is not erased when new data is added to it. A data warehouse is kept separate from the operational database and therefore frequent changes in the operational database are not reflected in the data warehouse.

Differences between operational database and data warehouse

Operational Database System (OLTP System)Data Warehouse (OLAP System)
Operational systems are generally designed to support high-volume transaction processing.Data warehousing systems are generally designed to support high volume analytical processing. (i.e., OLAP)
It is used for day-to-day operations.It is used for long-term informational requirements and decision support.
Operational data are the original sources of the data.Data comes from various OLTP Databases.
In an operational system data is stored with a functional or process orientation.In data warehousing systems data is stored with a subject orientation.
It provides detailed and flat relational viewof data.It provides a summarized and multidimensional view of data.
The tables and joins are complex since they are normalised (for RDMS). This is done to reduce redundant data and to save storage space.The tables and joins are simple since they are denormalized. This is done to reduce the response time for analytical queries. 
Entity-Relationship modelling techniques are used for RDBMS database design.Data-Modelling techniques are used for the Data Warehouse design.
Performance is low for analysis queries.High performance for analytical queries.
Data within operational systems are generally updated regularly.Data within a data warehouse is non-volatile, meaning when new data is added old data is not erased so rarely updates.
Data volumes are less and historical data is generally not maintained.It involves large data volumes and historical data.
Simple queries are capable of fetching the data.Complex queries are required to fetch data.
Processing speed is fast.Processing speed is slow because of the large size.
The common users are clerk, DBA, database professional.The common users are knowledge worker (e.g., manager, executive, analyst)

Also Read: Lifecycle of Data

About Author

Sarina Sindurakar