written 8.4 years ago by |
- The longest common subsequence (LCS) problem is the classic computer science problem.
- The longest common subsequence problem is the problem of finding the longest subsequence common to all sequences in the set of sequences.
- LCS is widely used by revision control systems and in biometrics.
- DNA is represented as strings of the small alphabet, Sigma = {A, C, G, T}.
- DNA strings can be changed by mutation by insertion of a character into string.
- A longest common subsequence of two strings can represent the common ancestry of the two strings.
Finding a LCS:
- Characterizing the longest common subsequences by defining optimal substructure of LCS.
- A recursive definition: If am = bn then we should find LCS of Am-1 and Bn-1. Otherwise, compare LCS of A and Bn-1 and LCS of Am-1 and B. Then from these sequences pick up the longer sequence.
- Compute length of LCS.
- Construct LCS using d – table.
Example:
Using dynamic programming determine longest sequence of (A, B, C, D, B, A, C, D, F) and (C, B, A, F).
Solution:
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|
A [ i ] | A | B | C | D | B | A | C | D | F |
B [ j ] | C | B | A | F |
We will build c [i, j] and d [i, j]. Initially c [i, 0] ← 0 where I represents row and c [0, j] ← 0 where j represent the column. The c table will store some values and d table will store directions.
Step 1:
The table is shown below.
Step 2:
As A [i] ≠ B [j]
If (c[0, 1] ≥ c [1, 0]) is true.
i.e. 0 ≥ 0
C [i, j] = c [i – 1, j]
C [1, 1] = c [0, 1]
C [1, 1] = 0
D [i, j] = ↑
Step 3:
Let i = 1, j = 2 then
As A [i] ≠ B [j]
If (c [i – 1, j] ≥ c [i, j - 1]) i.e. (c[0, 2] ≥ c [1, 1]) is true. i.e. 0 ≥0
C[i, j] = c [i – 1, j]
C[1, 2] = c [0, 2]
C[1, 2] = 0
D[i, j] = ↑
The table will be
Continuing in this fashion we can fill up the table as shown in figure 10: