Solve the LCS problem using C++

Introduction

The Longest Common Subsequence (LCS) problem is a classic computer science problem that involves finding the longest subsequence that is common to two sequences. A subsequence is a sequence derived by deleting some or none of the elements without changing the order of the remaining elements. The LCS problem is important in fields such as bioinformatics, text comparison, and version control systems. In this article, we will explore how to solve the LCS problem using C++ through various examples. We’ll start by understanding the definition and formula, then dive into different approaches to implement the solution.

Prerequisites

Before we begin, ensure you have a basic understanding of the following:

  1. C++ Programming Language: Familiarity with basic syntax, variables, and data types.
  2. Dynamic Programming: Understanding of dynamic programming concepts, as the LCS problem is typically solved using this approach.
  3. Strings and Arrays: Knowledge of how to manipulate strings and arrays in C++.

With these prerequisites in mind, let’s explore the LCS problem and its solutions.

Definition and Formula

Given two sequences, X and Y, the Longest Common Subsequence is the longest sequence that can be obtained from both sequences by deleting some or none of the elements without changing the order of the remaining elements.

The LCS of sequences X and Y can be defined recursively as follows:

  • If the last characters of X and Y match, then the LCS is the LCS of the substrings X[0..m-1] and Y[0..n-1] plus this matching character.
  • If the last characters of X and Y do not match, then the LCS is the maximum of the LCS of X[0..m-1] and Y[0..n] and the LCS of X[0..m] and Y[0..n-1].
LCS(X[0..m-1], Y[0..n-1]) = 
\begin{cases} 
    LCS(X[0..m-2], Y[0..n-2]) + 1 &\\ \text{if } X[m-1] = Y[n-1] \\
    \max(LCS(X[0..m-2], Y[0..n-1]), LCS(X[0..m-1], \\ Y[0..n-2])) & \\ \text{if } X[m-1] \neq Y[n-1]
\end{cases}

Example 1: Recursive Approach

Code Explanation

In this example, we will implement the LCS problem using a simple recursive approach.

Code

C++
#include <iostream>
#include <string>
using namespace std;

int LCS(string X, string Y, int m, int n) {
    if (m == 0 || n == 0)
        return 0;
    if (X[m - 1] == Y[n - 1])
        return 1 + LCS(X, Y, m - 1, n - 1);
    else
        return max(LCS(X, Y, m - 1, n), LCS(X, Y, m, n - 1));
}

int main() {
    string X = "AGGTAB";
    string Y = "GXTXAYB";
    int m = X.length();
    int n = Y.length();
    
    cout << "Length of LCS is " << LCS(X, Y, m, n) << endl;
    return 0;
}

Output

C++
Length of LCS is 4

Explanation

In this example, we use a recursive function LCS that takes the two strings X and Y along with their lengths m and n. The function checks if the last characters of the strings match and proceeds accordingly to find the LCS.

Example 2: Dynamic Programming Approach

Code Explanation

The recursive approach has exponential time complexity and is inefficient for large sequences. A more efficient approach is to use dynamic programming to store the results of subproblems.

Code

C++
#include <iostream>
#include <string>
#include <vector>
using namespace std;

int LCS(string X, string Y) {
    int m = X.length();
    int n = Y.length();
    vector<vector<int>> L(m + 1, vector<int>(n + 1));

    for (int i = 0; i <= m; i++) {
        for (int j = 0; j <= n; j++) {
            if (i == 0 || j == 0)
                L[i][j] = 0;
            else if (X[i - 1] == Y[j - 1])
                L[i][j] = L[i - 1][j - 1] + 1;
            else
                L[i][j] = max(L[i - 1][j], L[i][j - 1]);
        }
    }
    return L[m][n];
}

int main() {
    string X = "AGGTAB";
    string Y = "GXTXAYB";
    
    cout << "Length of LCS is " << LCS(X, Y) << endl;
    return 0;
}

Output

C++
Length of LCS is 4

Explanation

In this example, we use a 2D array L to store the lengths of LCS of subproblems. The array is filled in a bottom-up manner, and the final value L[m][n] gives the length of the LCS.

Example 3: Printing the Longest Common Subsequence

Code Explanation

In this example, we will modify the dynamic programming approach to not only find the length of the LCS but also print the LCS itself.

Code

C++
#include <iostream>
#include <string>
#include <vector>
using namespace std;

void printLCS(string X, string Y) {
    int m = X.length();
    int n = Y.length();
    vector<vector<int>> L(m + 1, vector<int>(n + 1));

    for (int i = 0; i <= m; i++) {
        for (int j = 0; j <= n; j++) {
            if (i == 0 || j == 0)
                L[i][j] = 0;
            else if (X[i - 1] == Y[j - 1])
                L[i][j] = L[i - 1][j - 1] + 1;
            else
                L[i][j] = max(L[i - 1][j], L[i][j - 1]);
        }
    }

    int index = L[m][n];
    string lcs(index, ' ');

    int i = m, j = n;
    while (i > 0 && j > 0) {
        if (X[i - 1] == Y[j - 1]) {
            lcs[index - 1] = X[i - 1];
            i--;
            j--;
            index--;
        } else if (L[i - 1][j] > L[i][j - 1]) {
            i--;
        } else {
            j--;
        }
    }

    cout << "LCS of " << X << " and " << Y << " is " << lcs << endl;
}

int main() {
    string X = "AGGTAB";
    string Y = "GXTXAYB";
    
    printLCS(X, Y);
    return 0;
}

Output

C++
LCS of AGGTAB and GXTXAYB is GTAB

Explanation

In this example, we extend the dynamic programming solution to trace back through the 2D array L to construct and print the LCS. This approach ensures that we not only find the length of the LCS but also the LCS itself.

Conclusion

The Longest Common Subsequence problem is a fundamental problem in computer science with numerous applications. In this article, we explored the problem, understood its mathematical formulation, and implemented solutions using C++. We started with a simple recursive approach, moved on to an efficient dynamic programming approach, and finally extended the solution to print the LCS itself.