Performance gain through pipelining can be reduced by the presence of program transfer instructions (such as JMP, CALL, RET and conditional jumps).
They change the sequence causing all the instructions that entered the pipeline after program transfer instruction invalid.
Suppose instruction I3 is a conditional jump to I50 at some other address (target address), then the instructions that entered after I3 is invalid and new sequence beginning with I50 need to be loaded in.
This causes bubbles in pipeline, where no work is done as the pipeline stages are reloaded.
To avoid this problem, the Pentium uses a scheme called Dynamic Branch Prediction.
In this scheme, a prediction is made concerning the branch instruction currently in pipeline.
Prediction will be either taken or not taken.
If the prediction turns out to be true, the pipeline will not be flushed and no clock cycles will be lost. If the prediction turns out to be false, the pipeline is flushed and started over with the correct instruction.
It results in a 3 cycle penalty if the branch is executed in the u-pipeline and 4 cycle penalty in v-pipeline.
It is implemented using a 4-way set associative cache with 256 entries. This is referred to as the Branch Target Buffer (BTB).
The directory entry for each line contains the following information:
Valid Bit : Indicates whether or not the entry is in use.
History Bits: track how often the branch has been taken.
Source memory address that the branch instruction was fetched from (address of I3).
If its directory entry is valid, the target address of the branch is stored in corresponding data entry in BTB.