What is Data Warehousing?
If you are beginner to the databases you may have wondered what this data warehousing really is? So, here you go! ;)
A real-time enterprise without real-time business intelligence is a real fast dump organization — Stephen Bobst.
Data is one of the most important assets of any organization. Purposes of data these data can mainly be divided in to two categories from database side as,
• Operational record keeping (OLTP)
• Analytical decision making (OLAP)
What is OLTP and OLAP?
OLTP stands for Online Transaction Processing. It typically deals with the real time data rather than historic data. OLAP stands for Online Analytical Processing which deals with big chunks of historical data which has been stored for a given time that are mainly used for summarizing and analytical purposes. That is where the Data Warehousing(DW) term comes to the play.
In order to store data over a long period of time and to do data mining with the aim of finding trends and meaningful insights for an organization, it requires a huge database which is not constantly changing.
Data warehousing is the process of collecting and managing data from varied sources to provide meaningful business insights. As the above figure displays it used to connect and analyze business data from heterogeneous sources.
The data loading to the data warehouse from sources through various pipelines happens in the sequence of ETL which stands for Extract, Transform and Load.
- Extract: It is the process of fetching data from the sources. At this stage, data is collected from multiple or different types of sources.(sources can be semi structured as XML files or JSON files, Databases, TSV or CSV files etc.) ETL is capable of handling this data disparities.
- Transform: It is the process of reshaping the extracted data from its original format into the targeted format. This is where the cleansing also takes place. This process may use rules and lookup tables to do some mappings if required.
- Load: It is the process of writing the extracted, transformed data into the target database.
A data warehouse design mainly consist of two types of tables.
- Fact tables- The fact table includes the measurement of the business process.
- Dimension tables- Dimensional tables include textual attributes of measurement saved in the fact tables.
Dimension tables are like categories of the fact tables. If we take an example,
fact :- sales of an organization
dimension :- sales by products, sales by customers, sales by specific period of time, sales by location
There are more deep concepts if we are willing to find more about Data warehousing but I think this wraps up pretty much about the basics of data warehousing.
With the above content I hope you got a more than high level understanding of Data Warehousing. :)