ETL for fact tables. |
ETL for dimension tables. |
Write procedure for all data loads. |
Organise data staging area and test tools. |
Plan for aggregate tables. |
Determine data transformation and clearing rules. |
Establish comprehensive data extraction rules. |
Prepare data mapping for target data elements. |
Determine all the data sources, both internal and external. |
Determine all the target data needed in data warehouse. |
ETL provides a well-defined process for extracting data from varied source and loading it in the data warehouse in a consolidated format.
Data Extraction
- It is the net 1st step in ETL process. During this phase required data is first identified and the extracted from varied sources like database systems and applications using as little resources as possible.
- During extraction stage a lot of data gets extracted than is actually required.
- Size of extracted data can range from hundreds of kilobytes up to gigabytes.
- Depending upon the capabilities of source system, sore transformation might take place during extraction process itself.
- To design and create an extraction process is most consuming part of ETL process. Identification of Data Source.
- The 1st stage of data extraction stage is identified of all the suitable data sources.
- This process not only identifies data source but also ensures that the data source and the extracted data will add weightage to data warehouse.
- Let us assume that an organization designs a database to provide strategic information on the orders that is fulfilled.
- To do that, it needs the records of previous as well as current fulfilled and pending orders.
- Now if orders are fulfilled through multiple channels, then organization also needs reports about these channels.
- The order fact table contains data related to order, such as data of delivery, item no., item codes, discounts and credit limit.
- The dimension table contains the details about products, customers and channels.
- The organization also needs to ensure that it has the correct data sources needed for database and this data source is able to supply correct data to each data element.
Identification of data source is a crucial step in the data extraction process, we need to go through the source identification and ensure that whatever bit of data is entered into the data warehouse must be authenticated.