0
4.3kviews
Discuss the process of extraction, transformation and loading with a neat and labelled diagram.
1 Answer
1
118views

Extraction - Transformation - Loading known as ETL process.

  • In ETL tools extract the data from different RDBMS source systems then transforms data by applying calculations concatenations etc and then load the data into Data Warehouse System.

  • Need of ETL

  • Helps companies to analyze their business data for taking critical business decisions.

  • Transactional databases cannot answer complex business questions that can be answered by ETL.

  • Provides common data repository.

  • ETL enables to move data from various sources into data warehouse.

  • As data source change DW will automatically update.

  • Allow verification of data transformation, aggregation and calculation rules.

  • ETL offers historical context for business.

    enter image description here Fig. ETL Process

Step 1 - Extraction.

In this step data is extracted from source system into staging area. Transformation if any are done in staging area so that performance of source system is not degraded.

There are 3 types of data extraction methods.

a] Full Extraction.

b] Partial Extraction without update notification.

c] Partial Extraction with update notification.

Validations done during extractions.

  • Reconcile records with source data.

  • Make sure no spam/unwanted data loaded.

  • Data type check.

  • Remove all types of duplicate/fragmented data.

  • Check whether all keys are placed or not.

Step 2 - Transformation.

  • Data extracted from source server is raw and not usable in its original form. Therefore it needs to be cleansed, mapped & transformed.

  • In fact, this is the key step where ETL process adds value & changes data such that insightful 31 reports can be generated.

  • In this step, we apply set apply set of functions on extracted data. Data that does not require any transformation is called as direct move or pass through data.

  • In this step, we can perform customized operations on data.

Validations done during transformation

  • Filtering - Select only section columns to load.

  • Using rules & lookup tables for data standardization.

  • Character set conversion & encoding handling.

  • Conversion of units of measurements like date . Time . Conversion . Currency Conversions . Numerical conversions etc.

  • Data threshold validation check.

  • Data flow validation from staging area to the intermediate tables.

  • Required field should not be kept blank.

  • Transposing rows and columns.

Step 3 - Loading

  • Loading data into target data warehouse data base is the last step of ETL process.

  • In a typical data ware house huge volume of data needs to be loaded in relatively short period hence, load process should be optimized for performance.

  • In case of load failure, recover mechanisms should be configured to restart from the point of failure without data integrity loss.

  • Data warehouse admins need to monitor resume, cancel loads as per prevailing server performance.

  • Type of loading :

a] Initial load - Populating all the data warehouse tables.

b] Incremental Load - Applying ongoing changes as when needed periodically.

c] Full refresh - Erasing the contents of one or more tables and reloading with fresh data.

Validations done during loading process.

  • Ensure that key field data is neither missing nor null.

  • Test modelling views based on target tables.

  • Check that combined values & calculation measures.

  • Data checks in dimension table as well as history table.

  • Check BI reports on the loaded fact and dimension table.

ETL tools :

1] Mark Logic.

2] Oracle.

3] Amazon Red Shift.

Please log in to add an answer.