0
10kviews
Explain Data loading and its types.

Mumbai University > Information Technology > Sem 5 > Advanced Database Management System

Marks: 5M

Year: Dec 2014

1 Answer
1
131views

The data warehouse is loaded with data that has been transformed coming from operational systems. The data coming from the operational systems undergoes a significant amount of conversion and transformation before the data is readied for loading into the data warehouse. This transformation and conversion is a complex subject that has been documented and discussed widely in the trade press, books, articles, white papers, and other places. This white paper is on the mechanics and techniques required for the movement of data from the operational environment to the data warehouse environment.

Three types of loads

There are three types of loads into the data warehouse:

  • The loading of data already archived,
  • The loading of data contained in existing applications,
  • The trapping of ongoing changes to the operational environment from the last time data was loaded into the data warehouse.

Figure 2 shows the three types of loads

enter image description here

Each of these types of loads has its own special set of considerations.

Loading archival data

The simplest type of load is that of loading older existing archival data into the data warehouse. Figure 3 illustrates this type of load

enter image description here

is read from the bulk storage and transformed into data ready to go into the warehouse. The same transformation of data occurs here as for other types of loads.

Occasionally the metadata is lost for the older archival data. Under anything but the most unusual of circumstances, the inability to relate archival data to its metadata renders the archival data useless.

Many companies have such undisciplined control of their archival data or such little use for the archival data that the archival data is not loaded into the data warehouse. In any case, when such a load occurs, the load is done only once.

Because of its infrequent execution and because the load is done only once (on those occasions where it is done at all), there is a minimal amount of concern as to the resources consumed by the loading of archival data into the data warehouse.

Loading data contained in existing systems

Unlike archival data (which often is not loaded into the data warehouse) the data contained in the existing operational systems is almost always loaded into the data warehouse. Figure 4 shows an example of the loading

enter image description here

Figure 4 shows that existing files are scanned and data stripped from them for transformation into the data warehouse. The execution of this process must be carefully timed because the data in the warehouse ages as soon as it is placed in the warehouse. In addition, there is the issue of changing technologies as the data is transformed. Seldom is the data warehouse technology the same as the operational one.

The resources consumed by this type of load are considerable. However, since this is a onetime only load, like the loading of archival data, the cost of resource consumption required for conversion is able to be accommodated.

An alternative to the direct reading of the existing systems database is to load the existing systems database down to a sequential medium and perform the transformation of data into the data warehouse using the flat files. Figure 5 shows this option.

enter image description here

There are many very desirable aspects to the technique shown in Figure 5. The first is that the download to a sequential medium can often be accomplished by the execution of a utility that operates very efficiently. Once the data is downloaded the transformation is done on another processor, out of way. In such a fashion the download places the minimal burden on the online system. But there is another important aspect as well. By moving the data off to the sequential environment then performing the transformation against the sequential data, the complexity of having to deal with the data in the existing databases is minimized or bypassed altogether. There is then a strong case to be made for the usage of an intermediate sequential file in the transformation of existing operational data to data warehouse data.

Please log in to add an answer.