0
2.3kviews
Explain B+-Trees Index files in DBMS.
1 Answer
0
34views

B+ -Trees

B+ tree is a (key, value) storage method in a tree like structure. B+ tree has one root, any number of intermediary nodes (usually one) and a leaf node. Here all leaf nodes will have the actual records stored. Intermediary nodes will have only pointers to the leaf nodes; it not has any data. Any node will have only two leaves.

This is the basic of any B+ tree. Consider the STUDENT table below. This can be stored in B+ tree structure as shown below. We can observe here that it divides the records into two and splits into left node and right node.

Left node will have all the values less than or equal to root node and the right node will have values greater than root node. The intermediary nodes at level 2 will have only the pointers to the leaf nodes.

The values shown in the intermediary nodes are only the pointers to next level. All the leaf nodes will have the actual records in a sorted order.

enter image description here

Fig: B+ tree

If we have to search for any record, they are all found at leaf node. Hence searching any record will take same time because of equidistance of the leaf nodes. Also they are all sorted. Hence searching a record is like a sequential search and does not take much time.

Suppose a B+ tree has an order of n (it is the number of branches – above tree structure has 5 branches altogether, hence order is 5), and then it can have n/2 to n intermediary nodes and n/2 to n-1 leaf nodes. In our example above, n= 5 i.e.; it has 5 branches from root. Then it can have intermediary nodes ranging from 3 to 5. And it can have leaf nodes from 3 to 4.

enter image description here

Fig: B+ tree Example

The main goal of B+ tree is:

Sorted Intermediary and leaf nodes: Since it is a balanced tree, all nodes should be sorted.

Fast traversal and Quick Search:

One should be able to traverse through the nodes very fast. That means, if we have to search for any particular record, we should be able pass through the intermediary node very easily. This is achieved by sorting the pointers at intermediary nodes and the records in the leaf nodes.

Any record should be fetched very quickly. This is made by maintaining the balance in the tree and keeping all the nodes at same distance.

No overflow pages: B+ tree allows all the intermediary and leaf nodes to be partially filled – it will have some percentage defined while designing a B+ tree. This percentage up to which nodes are filled is called fill factor. If a node reaches the fill factor limit, then it is called overflow page. If a node is too empty then it is called underflow. In our example above, intermediary node with 108 is underflow. And leaf nodes are not partially filled, hence it is an overflow. In ideal B+ tree, it should not have overflow or underflow except root node.

Searching a record in B+ Tree:

Suppose we want to search 65 in the below B+ tree structure. First we will fetch for the intermediary node which will direct to the leaf node that can contain record for 65. So we find branch between 50 and 75 nodes in the intermediary node. Then we will be redirected to the third leaf node at the end. Here DBMS will perform sequential search to find 65. Suppose, instead of 65, we have to search for 60. What will happen in this case? We will not be able to find in the leaf node. No insertions/update/delete is allowed during the search in B+ tree.

enter image description here

Fig: Searching a record in B+ Tree

Insertion in B+ tree

Suppose we have to insert a record 60 in below structure. It will go to 3rd leaf node after 55. Since it is a balanced tree and that leaf node is already full, we cannot insert the record there. But it should be inserted there without affecting the fill factor, balance and order. So the only option here is to split the leaf node. But how do we split the nodes?

enter image description here

Fig1: Insertion in B+ tree

The 3rd leaf node should have values (50, 55, 60, 65, 70) and its current root node is 50. We will split the leaf node in the middle so that its balance is not altered. So we can group (50, 55) and (60, 65, 70) into 2 leaf nodes. If these two has to be leaf nodes, the intermediary node cannot branch from 50. It should have 60 added to it and then we can have pointers to new leaf node.

enter image description here

Fig 2: Insertion in B+ tree

This is how we insert a new entry when there is overflow. In normal scenario, it is simple to find the node where it fits and place it in that leaf node.

Delete in B+ tree:

Suppose we have to delete 60 from the above example. What will happen in this case? We have to remove 60 from 4th leaf node as well as from the intermediary node too.

If we remove it from intermediary node, the tree will not satisfy B+ tree rules. So we need to modify it have a balanced tree. After deleting 60 from above B+ tree and re-arranging nodes, it will appear as below.

enter image description here

Fig1: Delete in B+ tree

Suppose we have to delete 15 from above tree. We will traverse to the 1st leaf node and simply delete 15 from that node. There is no need for any re-arrangement as the tree is balanced and 15 do not appear in the intermediary node.

enter image description here

Fig 2: Delete in B+ tree

B+ Tree Extensions

As the number of records grows in the database, the intermediary and leaf nodes needs to be split and spread widely to keep the balance of the tree. This is called as B+ tree extensions. As it spreads out widely, the searching of records becomes faster. The main goal of creating B+ tree is faster traversal of records. As the branches spreads out, it requires less I/O on disk to get the record.

Record that needs to be fetched are fetched in logarithmic fraction of time. Suppose we have K search key values – that is the pointers in the intermediary node for n nodes. Then we can fetch any record in the b+ tree in log (n/2) (K).

Each node takes 40bytes to store an index and each disk block is of 40Kbytes. That means we can have 100 nodes (n). Say we have 1million search key values – that means we have 1 million intermediary pointers. Then we can access log 50 (1000000) = 4 nodes are accessed in one go.

Hence this costs only 4milliseconds to fetch any node in the tree. Now we can guess the advantage of extending the B+ tree into more intermediary nodes. As intermediary nodes spread out more and more, it is more efficient in fetching the records in B+ tree.

Look at below two diagrams to understand how it makes difference with B+ tree extensions.

enter image description here

Fig: B+ Tree Extensions

Please log in to add an answer.