Introduction
The Longest Common Subsequence (LCS) problem is a classic computer science problem that involves finding the longest subsequence that is common to two sequences. A subsequence is a sequence derived by deleting some or none of the elements without changing the order of the remaining elements. The LCS problem is important in fields such as bioinformatics, text comparison, and version control systems. In this article, we will explore how to solve the LCS problem using C++ through various examples. We’ll start by understanding the definition and formula, then dive into different approaches to implement the solution.
Prerequisites
Before we begin, ensure you have a basic understanding of the following:
- C++ Programming Language: Familiarity with basic syntax, variables, and data types.
- Dynamic Programming: Understanding of dynamic programming concepts, as the LCS problem is typically solved using this approach.
- Strings and Arrays: Knowledge of how to manipulate strings and arrays in C++.
With these prerequisites in mind, let’s explore the LCS problem and its solutions.
Definition and Formula
Given two sequences, X
and Y
, the Longest Common Subsequence is the longest sequence that can be obtained from both sequences by deleting some or none of the elements without changing the order of the remaining elements.
The LCS of sequences X
and Y
can be defined recursively as follows:
- If the last characters of
X
andY
match, then the LCS is the LCS of the substringsX[0..m-1]
andY[0..n-1]
plus this matching character. - If the last characters of
X
andY
do not match, then the LCS is the maximum of the LCS ofX[0..m-1]
andY[0..n]
and the LCS ofX[0..m]
andY[0..n-1]
.
LCS(X[0..m-1], Y[0..n-1]) = \begin{cases} LCS(X[0..m-2], Y[0..n-2]) + 1 &\\ \text{if } X[m-1] = Y[n-1] \\ \max(LCS(X[0..m-2], Y[0..n-1]), LCS(X[0..m-1], \\ Y[0..n-2])) & \\ \text{if } X[m-1] \neq Y[n-1] \end{cases}
Example 1: Recursive Approach
Code Explanation
In this example, we will implement the LCS problem using a simple recursive approach.
Code
#include <iostream>
#include <string>
using namespace std;
int LCS(string X, string Y, int m, int n) {
if (m == 0 || n == 0)
return 0;
if (X[m - 1] == Y[n - 1])
return 1 + LCS(X, Y, m - 1, n - 1);
else
return max(LCS(X, Y, m - 1, n), LCS(X, Y, m, n - 1));
}
int main() {
string X = "AGGTAB";
string Y = "GXTXAYB";
int m = X.length();
int n = Y.length();
cout << "Length of LCS is " << LCS(X, Y, m, n) << endl;
return 0;
}
Output
Length of LCS is 4
Explanation
In this example, we use a recursive function LCS
that takes the two strings X
and Y
along with their lengths m
and n
. The function checks if the last characters of the strings match and proceeds accordingly to find the LCS.
Example 2: Dynamic Programming Approach
Code Explanation
The recursive approach has exponential time complexity and is inefficient for large sequences. A more efficient approach is to use dynamic programming to store the results of subproblems.
Code
#include <iostream>
#include <string>
#include <vector>
using namespace std;
int LCS(string X, string Y) {
int m = X.length();
int n = Y.length();
vector<vector<int>> L(m + 1, vector<int>(n + 1));
for (int i = 0; i <= m; i++) {
for (int j = 0; j <= n; j++) {
if (i == 0 || j == 0)
L[i][j] = 0;
else if (X[i - 1] == Y[j - 1])
L[i][j] = L[i - 1][j - 1] + 1;
else
L[i][j] = max(L[i - 1][j], L[i][j - 1]);
}
}
return L[m][n];
}
int main() {
string X = "AGGTAB";
string Y = "GXTXAYB";
cout << "Length of LCS is " << LCS(X, Y) << endl;
return 0;
}
Output
Length of LCS is 4
Explanation
In this example, we use a 2D array L
to store the lengths of LCS of subproblems. The array is filled in a bottom-up manner, and the final value L[m][n]
gives the length of the LCS.
Example 3: Printing the Longest Common Subsequence
Code Explanation
In this example, we will modify the dynamic programming approach to not only find the length of the LCS but also print the LCS itself.
Code
#include <iostream>
#include <string>
#include <vector>
using namespace std;
void printLCS(string X, string Y) {
int m = X.length();
int n = Y.length();
vector<vector<int>> L(m + 1, vector<int>(n + 1));
for (int i = 0; i <= m; i++) {
for (int j = 0; j <= n; j++) {
if (i == 0 || j == 0)
L[i][j] = 0;
else if (X[i - 1] == Y[j - 1])
L[i][j] = L[i - 1][j - 1] + 1;
else
L[i][j] = max(L[i - 1][j], L[i][j - 1]);
}
}
int index = L[m][n];
string lcs(index, ' ');
int i = m, j = n;
while (i > 0 && j > 0) {
if (X[i - 1] == Y[j - 1]) {
lcs[index - 1] = X[i - 1];
i--;
j--;
index--;
} else if (L[i - 1][j] > L[i][j - 1]) {
i--;
} else {
j--;
}
}
cout << "LCS of " << X << " and " << Y << " is " << lcs << endl;
}
int main() {
string X = "AGGTAB";
string Y = "GXTXAYB";
printLCS(X, Y);
return 0;
}
Output
LCS of AGGTAB and GXTXAYB is GTAB
Explanation
In this example, we extend the dynamic programming solution to trace back through the 2D array L
to construct and print the LCS. This approach ensures that we not only find the length of the LCS but also the LCS itself.
Conclusion
The Longest Common Subsequence problem is a fundamental problem in computer science with numerous applications. In this article, we explored the problem, understood its mathematical formulation, and implemented solutions using C++. We started with a simple recursive approach, moved on to an efficient dynamic programming approach, and finally extended the solution to print the LCS itself.