Loading data into a data warehouse
The loading phase is the last step of the ETL process. The information from data sources are loaded and stored in a form of tables.
There are two types of tables in the database structure: fact tables and dimensions tables (described in detail in separate articles). Once the fact and dimension tables are loaded, it is time to improve performance of the Business Intelligence data by creating aggregates.
Aggregations
In a large databases records are usually aggregated to improve performance. To design the most effective aggregates you should meet some basic requirements. At first all aggregates should be stored in their own fact table(separated from base-level data). Second all dimensions related should be shrunken versions of dimensions associated with base-level data.
At third associate the base fact tables and aggregate fact tables in one family and force SQL to refer it.
The aggregations have beneficial influence on performance. It is a very popular technique for speeding up query time in business decisions. It is possible with using the aggregate navigator. The aggregate navigator allows to use information stored in aggregates automatically , understand the clients SQL and transform base-level SQL into an aggregate SQL.
Effective loading process
If you want to perform effective load process, there are a few things to have in mind:
- you should not forget about leaving all indexes before start loading tables. You may rebuild them without any problems after loading.
- you should manage the partitions. Partitions allow to divide one table in many smaller tables for administration purposes and to improve the query performance in large fact tables. The most recommended strategy is to partition tables by a data interval like year , month or quarter.
- you should make the following steps during loading the data: separate inserts from updates (separate old data with a new one) , make a bulk load , load in parallel(you may use partitions) delete updates and subsequently bulk load new versions of records , built outside aggregates.
- you should allow for incremental loading which keep the database synchronized with the source system.
Feeding a data warehouse
Loading dimension and fact tables do not cause the end of creating database. The data can be modified after loading without affecting the end user quire process.
Such modifications are called the graceful modifications and include:
- Adding a fact or dimension to an existing fact table of the same grain
- Adding an attribute to an existing dimension
- Increase the granularity of existing fact and dimension table
- The data is also analyzed many times after the loading step. The most popular structure that allows fast analysis of data is OLAP(Online Analytical Processes)cube. It provide quickly multi-dimensional analytical queries in a short time.
The loading process seems to be very time-consuming and quite difficult. However it is one of three the most important steps of creating databases and should be performed carefully and without hurry.
Generally loading dimension and fact tables are quite similarly but remember one thing: before loading fact tables first load dimension tables because facts make no sense without dimensions!!!
Read more about ETL process and data warehouse concepts, architecture and design.