written 7.9 years ago by | modified 2.8 years ago by |
Mumbai University > Information Technology > Sem 5 > Advanced Database Management System
Marks: 5M
Year: Dec 2014
written 7.9 years ago by | modified 2.8 years ago by |
Mumbai University > Information Technology > Sem 5 > Advanced Database Management System
Marks: 5M
Year: Dec 2014
written 7.9 years ago by |
The data warehouse is loaded with data that has been transformed coming from operational systems. The data coming from the operational systems undergoes a significant amount of conversion and transformation before the data is readied for loading into the data warehouse. This transformation and conversion is a complex subject that has been documented and discussed widely in the trade press, books, articles, white papers, and other places. This white paper is on the mechanics and techniques required for the movement of data from the operational environment to the data warehouse environment.
Three types of loads
There are three types of loads into the data warehouse:
Figure 2 shows the three types of loads
Each of these types of loads has its own special set of considerations.
Loading archival data
The simplest type of load is that of loading older existing archival data into the data warehouse. Figure 3 illustrates this type of load
is read from the bulk storage and transformed into data ready to go into the warehouse. The same transformation of data occurs here as for other types of loads.
Occasionally the metadata is lost for the older archival data. Under anything but the most unusual of circumstances, the inability to relate archival data to its metadata renders the archival data useless.
Many companies have such undisciplined control of their archival data or such little use for the archival data that the archival data is not loaded into the data warehouse. In any case, when such a load occurs, the load is done only once.
Because of its infrequent execution and because the load is done only once (on those occasions where it is done at all), there is a minimal amount of concern as to the resources consumed by the loading of archival data into the data warehouse.
Loading data contained in existing systems
Unlike archival data (which often is not loaded into the data warehouse) the data contained in the existing operational systems is almost always loaded into the data warehouse. Figure 4 shows an example of the loading
Figure 4 shows that existing files are scanned and data stripped from them for transformation into the data warehouse. The execution of this process must be carefully timed because the data in the warehouse ages as soon as it is placed in the warehouse. In addition, there is the issue of changing technologies as the data is transformed. Seldom is the data warehouse technology the same as the operational one.
The resources consumed by this type of load are considerable. However, since this is a onetime only load, like the loading of archival data, the cost of resource consumption required for conversion is able to be accommodated.
An alternative to the direct reading of the existing systems database is to load the existing systems database down to a sequential medium and perform the transformation of data into the data warehouse using the flat files. Figure 5 shows this option.
There are many very desirable aspects to the technique shown in Figure 5. The first is that the download to a sequential medium can often be accomplished by the execution of a utility that operates very efficiently. Once the data is downloaded the transformation is done on another processor, out of way. In such a fashion the download places the minimal burden on the online system. But there is another important aspect as well. By moving the data off to the sequential environment then performing the transformation against the sequential data, the complexity of having to deal with the data in the existing databases is minimized or bypassed altogether. There is then a strong case to be made for the usage of an intermediate sequential file in the transformation of existing operational data to data warehouse data.