Data Flow Testing

426views

written 5.3 years ago by

teamques10 ★ 68k

In path coverage, the stress was to cover a path using statement or branch coverage. However, data and data integrity is as important as code and code integrity of a module. We have checked every possibility of the control flow of a module, but what about the data flow in the module? Has every data object been initialized prior to use? Have all defined data objects been used for something? These questions can be answered if we consider data objects in the control flow of a module.

Data flow testing is a white-box testing technique that can be used to detect improper use of data values due to coding errors. Errors may be unintentionally introduced in a program by programmers. For instance, a programmer might use a variable without defining it. Moreover, he/she may define a variable, but not initialize it and then use that variable in a predicate. For example,

int a;
if (a==67){}

In this way, data flow testing gives a chance to look out for inappropriate data definition, its use in predicates, computations, and termination. It identifies potential bugs by examining the patterns in which that piece of data is used. For example, if an out-of-scope data is being used in a computation, then it is a bug. There may be several patterns like this, which indicate data anomalies.

To examine the patterns, the control flow graph of a program is used. This test strategy selects the paths in the module's control flow such that various sequences of data objects can be chosen. The major focus is on the points at which the data receives values and the places at which the data initialized has been referenced. Thus, we have to choose enough paths in the control flow to ensure that every data is initialized before use and all the defined data have been used somewhere. Data flow testing closely examines the state of the data in the control flow graph, resulting in a richer test suite than the one obtained from control flow graph based path testing strategies, for example, branch coverage, all statement coverage, etc.

1. State of Data Objects

A data object can be in the following states:

Defined (d): A data object is called defined when it is initialized, that is, when it is on the left side of an assignment statement. Defined state can also be used to mean that a file has been opened, a dynamically allocated object has been allocated, something is pushed onto the stack, a record written, and so on.

Killed/Undefined/Released $(k)$: When the data has been reinitialized or the scope of a loop control variable finishes, that is, exiting the loop or memory is released dynamically or a file has been closed.

Usage $(u)$: When the data object is on the right side of assignment or used as a control variable in a loop, or in an expression used to evaluate the control flow of a case statement, or as a pointer to an object, etc. In general, we say that the usage is either computational use (c-use) or predicate use (p-use).

2. Data Flow Anomalies

Data-flow anomalies represent the patterns of data usage, which may lead to an incorrect execution of the code. An anomaly is denoted by a two-character sequence of actions. For example, 'dk' means a variable is defined and killed without any use, which is a potential bug. There are nine possible two- character combinations out of which only four are data anomalies, as shown in below.

enter image description here

It can be observed that not all data-flow anomalies are harmful, but most of them are suspicious and indicate that an error can occur. In addition to the above two-character data anomalies, there may be single-character data anomalies also. To represent these types of anomalies, we take the following conventions:

$\sim x :$ indicates all prior actions are not of interest to $x.$

$x \sim$ indicates all post actions are not of interest to $x.$

All single-character data anomalies are listed in Table

enter image description here

3. Terminology Used In Data Flow Design

Definition node: Defining a variable means assigning value to a variable for the very first time in a program. For example, input statements, assignment statements, loop control statements, procedure calls, etc.

Usage node: It means the variable has been used in some statement of the program. Node n that belongs to $G(P)$ is a usage node of variable v if the value of variable v is used at the statement corresponding to node n. For example, output statements, assignment statements (right), conditional statements, loop control statements, etc.

A usage node can be of the following two types:

Predicate usage node: If usage node n is a predicate node, then n is a predicate usage node.

Computation usage node: If usage node n corresponds to a computation statement in a program other than predicate, then it is called a computation usage node.

Loop-free path segment: It is a path segment for which every node is visited once at most.

Simple path segment: It is a path segment in which at most one node is visited twice. A simple path segment is either loop-free or if there is a loop, only one node is involved.

Definition-use path (du-path): A du-path with respect to a variable v is a path between the definition node and the usage node of that variable. Usage node can either be a p-usage or a c-usage node.

Definition-clear path (dc-path): A dc-path with respect to a variable v is a path between the definition node and the usage node such that no other node in the path is a defining node of variable v.

The du-paths, which are not dc-paths, are important from testing viewpoint, as these are potential problematic spots for testing persons. Those du-paths which are definition-clear are easy to test in comparison to du-paths which are not dc-paths. The application of data flow testing can be extended to debugging where a testing person finds the problematic areas in code to trace the bug. So the du-paths, which are not dc-paths need more attention.

4. Static Data Flow Testing

With static analysis, the source code is analysed without executing it.

Static Analysis is not Enough

It is not always possible to determine the state of a data variable by just static analysis of the code. For example, if the data variable in an array is used as an index for a collection of data elements, we cannot determine its state by static analysis; or it may be the case that the index is generated dynamically during execution, therefore we cannot guarantee what the state of the array element is referenced by that index. The static data flow testing might denote a certain piece of code to be anomalous that is never executed and hence, not completely anomalous. Thus, all anomalies using static analysis cannot be determined and this problem is provably unsolvable.

5. Dynamic Data Flow Testing

Dynamic data-flow testing is performed with the intention to uncover possible bugs in data usage during the execution of the code. The test cases are designed in such a way that every definition of data variable to each of its use is traced and every use is traced to each of its definition. Various strategies are employed for the creation of test cases. All these strategies are defined below.

All-du Paths (ADUP): It states that every du-path from every definition of every variable to every use of that definition should be exercised under some test. It is the strongest data flow testing strategy since it is a superset of all other data flow testing strategies. Moreover, this strategy requires the maximum number of paths for testing.

All-uses (AU): This states that for every use of the variable, there is a path from the definition of that variable (nearest to the use in backward direction) to the use.

All-p-uses/Some-c-uses (APU+C) This strategy states that for every variable and every definition of that variable, at least one dc-path from the definition to every predicate use should be included. If there are definitions of the variable with no p-use following it, then add computational use (c-use) test cases as required to cover every definition.

All-c-uses/Some-p-uses (ACU+P) This strategy states that for every variable and every definition of that variable, at least one dc-path from the definition to every computational use should be included. If there are definitions of the variable with no c-use following it, then add predicate use (p-use) test cases as required to cover every definition.

All-Predicate-Uses (APU): It is derived from the APU+C strategy and states that for every variable, there is a path from every definition to every p-use of that definition. If there is a definition with no p-use following it, then it is dropped from contention.

All-Computational-Uses (ACU) It is derived from the strategy ACU+P strategy and states that for every variable, there is a path from every definition to every c-use of that definition. If there is a definition with no c-use following it, then it is dropped from contention.

All-Definition (AD) It states that every definition of every variable should be covered by at least one use of that variable, be that a computational use or a predicate use.

6. Ordering of Data Flow Testing Strategies

While selecting a test case, we need to analyse the relative strengths of various data flow testing strategies. Figure below depicts the relative strength of the data flow strategies. In this figure, the relative strength of testing strategies reduces along the direction of the arrow. It means that all-du-path (ADPU) is the strongest criterion for selecting the test cases.

enter image description here

ADD COMMENT EDIT