Problem: Longest Common Subsequence

Our second 2D problem introduces a fundamental pattern: comparing two sequences.

Problem Statement

Given two strings text1 and text2, return the length of their longest common subsequence. If there is no common subsequence, return 0.

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.

For example, "ace" is a subsequence of "abcde".

A common subsequence of two strings is a subsequence that is common to both strings.

Example 1:

Input: text1 = "abcde", text2 = "ace"
Output: 3
Explanation: The longest common subsequence is "ace" (length 3).

Example 2:

Input: text1 = "abc", text2 = "abc"
Output: 3
Explanation: The longest common subsequence is "abc" (length 3).

Example 3:

Input: text1 = "abc", text2 = "def"
Output: 0
Explanation: No common subsequence.

Understanding Subsequences

Subsequence: Preserve order, but can skip characters

text = "abcde"

Valid subsequences: "a", "b", "ab", "ac", "ace", "abde", "abcde", "", etc.

NOT valid subsequences: "ba" (wrong order), "aec" (wrong order)

Total subsequences of length n: 2^n (include or exclude each character)

Common subsequence: Present in both strings

Example:

text1 = "abcde"
text2 = "aec"

Common subsequences: "a", "e", "ae", "ac"

Longest: length 2 (e.g., "ae" or "ac")

Why This is Hard (Brute Force)

Naive approach: Generate all subsequences of text1, check each against text2

Complexity: O(2^n × m) where n = length of text1, m = length of text2

For n=20: Over 1 million subsequences to check!

DP gives us O(n × m): Much better!

Step 1: Define the State

What should dp[i][j] represent?

Definition: dp[i][j] = length of LCS of text1[0...i-1] and text2[0...j-1]

In other words: LCS of the first i characters of text1 and the first j characters of text2.

Why this works: Answer is dp[n][m] where n and m are the lengths of text1 and text2.

Alternative interpretation: Using 0-indexed thinking:

dp[0][j] = LCS of empty string and text2[0...j-1]
dp[i][0] = LCS of text1[0...i-1] and empty string
dp[i][j] = LCS of text1[0...i-1] and text2[0...j-1]

Step 2: Identify Base Cases

What are the simplest subproblems?

dp[0][j] for all j: LCS of empty string and text2[0...j-1]

No characters in text1, so no common subsequence
dp[0][j] = 0 for all j

dp[i][0] for all i: LCS of text1[0...i-1] and empty string

No characters in text2, so no common subsequence
dp[i][0] = 0 for all i

Base case summary:

       ""  a  c  e
    "" 0   0  0  0
    a  0   ?  ?  ?
    b  0   ?  ?  ?
    c  0   ?  ?  ?
    d  0   ?  ?  ?
    e  0   ?  ?  ?

Entire first row and first column are 0.

Step 3: Find the Recurrence Relation

For dp[i][j], we're considering text1[i-1] and text2[j-1] (the i-th and j-th characters, using 0-indexed strings).

Two cases:

Case 1: Characters Match

If text1[i-1] == text2[j-1]:

This character is part of the LCS!

LCS length: 1 (for this character) + LCS of the strings before these characters

dp[i][j] = 1 + dp[i-1][j-1]

Example:

text1 = "abc", text2 = "ac"
At i=3, j=2: text1[2]='c', text2[1]='c' → Match!
dp[3][2] = 1 + dp[2][1]

Case 2: Characters Don't Match

If text1[i-1] != text2[j-1]:

This character from one of the strings is NOT in the LCS.

Two options:

Exclude text1[i-1]: LCS is dp[i-1][j]
Exclude text2[j-1]: LCS is dp[i][j-1]

Take the better (longer) option:

dp[i][j] = max(dp[i-1][j], dp[i][j-1])

Example:

text1 = "abc", text2 = "ac"
At i=2, j=2: text1[1]='b', text2[1]='c' → No match
dp[2][2] = max(dp[1][2], dp[2][1])

Complete Recurrence

Java

if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
    dp[i][j] = 1 + dp[i - 1][j - 1];
} else {
    dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
}

Step 4: Implementation

Java

public class LongestCommonSubsequence {
    public int longestCommonSubsequence(String text1, String text2) {
        int m = text1.length();
        int n = text2.length();

        // Create DP table
        // dp[i][j] = LCS length of text1[0...i-1] and text2[0...j-1]
        int[][] dp = new int[m + 1][n + 1];

        // Base cases: first row and column are already 0 (default)

        // Fill table
        for (int i = 1; i <= m; i++) {
            for (int j = 1; j <= n; j++) {
                if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
                    // Characters match: include in LCS
                    dp[i][j] = 1 + dp[i - 1][j - 1];
                } else {
                    // Characters don't match: take best of excluding one
                    dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
                }
            }
        }

        // Return answer
        return dp[m][n];
    }
}

Step 5: Trace Through an Example

Let's trace longestCommonSubsequence("abcde", "ace"):

text1 = "abcde" (length 5)
text2 = "ace"   (length 3)

Initialize:

       ""  a  c  e
    "" 0   0  0  0
    a  0   ?  ?  ?
    b  0   ?  ?  ?
    c  0   ?  ?  ?
    d  0   ?  ?  ?
    e  0   ?  ?  ?

Fill table:

i=1, j=1: text1[0]='a', text2[0]='a' → Match!
  dp[1][1] = 1 + dp[0][0] = 1 + 0 = 1

i=1, j=2: text1[0]='a', text2[1]='c' → No match
  dp[1][2] = max(dp[0][2], dp[1][1]) = max(0, 1) = 1

i=1, j=3: text1[0]='a', text2[2]='e' → No match
  dp[1][3] = max(dp[0][3], dp[1][2]) = max(0, 1) = 1

       ""  a  c  e
    "" 0   0  0  0
    a  0   1  1  1
    b  0   ?  ?  ?
    c  0   ?  ?  ?
    d  0   ?  ?  ?
    e  0   ?  ?  ?

i=2, j=1: text1[1]='b', text2[0]='a' → No match
  dp[2][1] = max(dp[1][1], dp[2][0]) = max(1, 0) = 1

i=2, j=2: text1[1]='b', text2[1]='c' → No match
  dp[2][2] = max(dp[1][2], dp[2][1]) = max(1, 1) = 1

i=2, j=3: text1[1]='b', text2[2]='e' → No match
  dp[2][3] = max(dp[1][3], dp[2][2]) = max(1, 1) = 1

       ""  a  c  e
    "" 0   0  0  0
    a  0   1  1  1
    b  0   1  1  1
    c  0   ?  ?  ?
    d  0   ?  ?  ?
    e  0   ?  ?  ?

i=3, j=1: text1[2]='c', text2[0]='a' → No match
  dp[3][1] = max(dp[2][1], dp[3][0]) = max(1, 0) = 1

i=3, j=2: text1[2]='c', text2[1]='c' → Match!
  dp[3][2] = 1 + dp[2][1] = 1 + 1 = 2

i=3, j=3: text1[2]='c', text2[2]='e' → No match
  dp[3][3] = max(dp[2][3], dp[3][2]) = max(1, 2) = 2

       ""  a  c  e
    "" 0   0  0  0
    a  0   1  1  1
    b  0   1  1  1
    c  0   1  2  2
    d  0   ?  ?  ?
    e  0   ?  ?  ?

i=4, j=1: text1[3]='d', text2[0]='a' → No match
  dp[4][1] = max(dp[3][1], dp[4][0]) = max(1, 0) = 1

i=4, j=2: text1[3]='d', text2[1]='c' → No match
  dp[4][2] = max(dp[3][2], dp[4][1]) = max(2, 1) = 2

i=4, j=3: text1[3]='d', text2[2]='e' → No match
  dp[4][3] = max(dp[3][3], dp[4][2]) = max(2, 2) = 2

       ""  a  c  e
    "" 0   0  0  0
    a  0   1  1  1
    b  0   1  1  1
    c  0   1  2  2
    d  0   1  2  2
    e  0   ?  ?  ?

i=5, j=1: text1[4]='e', text2[0]='a' → No match
  dp[5][1] = max(dp[4][1], dp[5][0]) = max(1, 0) = 1

i=5, j=2: text1[4]='e', text2[1]='c' → No match
  dp[5][2] = max(dp[4][2], dp[5][1]) = max(2, 1) = 2

i=5, j=3: text1[4]='e', text2[2]='e' → Match!
  dp[5][3] = 1 + dp[4][2] = 1 + 2 = 3

Final table:

       ""  a  c  e
    "" 0   0  0  0
    a  0   1  1  1
    b  0   1  1  1
    c  0   1  2  2
    d  0   1  2  2
    e  0   1  2  3

Return dp[5][3] = 3 ✓

The LCS has length 3, which is "ace"!

Reconstructing the LCS

The DP table tells us the length, but what is the actual LCS?

Backtracking approach:

Start at dp[m][n] and work backwards:

Java

public String getLCS(String text1, String text2, int[][] dp) {
    StringBuilder lcs = new StringBuilder();
    int i = text1.length();
    int j = text2.length();

    while (i > 0 && j > 0) {
        if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
            // This character is in the LCS
            lcs.append(text1.charAt(i - 1));
            i--;
            j--;
        } else if (dp[i - 1][j] > dp[i][j - 1]) {
            // LCS came from above
            i--;
        } else {
            // LCS came from left
            j--;
        }
    }

    // Reverse because we built it backwards
    return lcs.reverse().toString();
}

Trace for our example:

Start: i=5, j=3, dp[5][3]=3

text1[4]='e', text2[2]='e' → Match!
  lcs = "e", move to i=4, j=2

text1[3]='d', text2[1]='c' → No match
  dp[3][2]=2 > dp[4][1]=1 → Move up (i=3, j=2)

text1[2]='c', text2[1]='c' → Match!
  lcs = "ec", move to i=2, j=1

text1[1]='b', text2[0]='a' → No match
  dp[1][1]=1 == dp[2][0]=0 → Move left (i=2, j=0) [could also move up]

Actually dp[1][1]=1 > dp[2][0]=0, so move up: i=1, j=1

text1[0]='a', text2[0]='a' → Match!
  lcs = "eca", move to i=0, j=0

Reverse: "ace" ✓

Complexity Analysis

Time Complexity: O(m × n)

Two nested loops: m × n iterations
Each cell: O(1) work (comparison, max)
Total: O(m × n)

Space Complexity: O(m × n)

DP table of size (m+1) × (n+1)
Total: O(m × n)

Space Optimization

Observation: To compute row i, we only need row i-1.

Optimized approach:

Java

public int longestCommonSubsequence(String text1, String text2) {
    int m = text1.length();
    int n = text2.length();

    // Use only one row
    int[] dp = new int[n + 1];

    for (int i = 1; i <= m; i++) {
        int prev = 0;  // Stores dp[i-1][j-1]
        for (int j = 1; j <= n; j++) {
            int temp = dp[j];  // Save dp[i-1][j] before overwriting
            if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
                dp[j] = 1 + prev;
            } else {
                dp[j] = Math.max(dp[j], dp[j - 1]);
            }
            prev = temp;  // For next iteration, this is dp[i-1][j-1]
        }
    }

    return dp[n];
}

Space complexity: O(n) instead of O(m × n)!

Note: This is trickier than Unique Paths because we need the diagonal value (dp[i-1][j-1]), not just above and left. We use a prev variable to store it.

Real-World Applications

LCS is the foundation for:

Diff tools (git diff, file comparison)
- Find common lines, highlight changes
DNA sequence alignment
- Find similar regions in genetic sequences
Spell checkers
- Compute similarity between words
Plagiarism detection
- Find common text segments
Version control
- Merge different file versions

Variations

Variation 1: Longest Common Substring

Different from subsequence: Must be contiguous!

Example:

text1 = "abcde"
text2 = "abfce"

LCS (subsequence): "abce" (length 4)

Longest common substring: "ab" (length 2)

DP modification:

Java

if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
    dp[i][j] = 1 + dp[i - 1][j - 1];
    maxLen = Math.max(maxLen, dp[i][j]);
} else {
    dp[i][j] = 0;  // Reset for substring!
}

Variation 2: Shortest Common Supersequence

Problem: Find shortest string that has both text1 and text2 as subsequences.

Solution: Length = m + n - LCS_length

Why? Shared characters (LCS) count once, others twice.

Key Takeaways

LCS demonstrates:

Two-sequence comparison: State tracks positions in both strings
Conditional recurrence: Different cases based on character match
Backtracking: Reconstruct solution from DP table
Practical importance: Foundation for many real-world algorithms

Pattern Recognition:

This is a sequence comparison problem where:

State is (position in text1, position in text2)
Recurrence branches on character match
Base cases are empty sequences

This pattern appears in:

Edit Distance (next problem!)
Longest Palindromic Subsequence
Minimum ASCII Delete Sum

Sequence comparison is one of the most important 2D DP patterns.

What's Next

Our next problem, Edit Distance, extends the LCS idea with weighted operations. Instead of just finding common characters, we'll compute the minimum cost to transform one string into another.

Command Palette

Problem: Longest Common Subsequence

Problem Statement

Understanding Subsequences

Why This is Hard (Brute Force)

Step 1: Define the State

Step 2: Identify Base Cases

Step 3: Find the Recurrence Relation

Case 1: Characters Match

Case 2: Characters Don't Match

Complete Recurrence

Step 4: Implementation

Step 5: Trace Through an Example

Reconstructing the LCS

Complexity Analysis

Space Optimization

Real-World Applications

Variations

Variation 1: Longest Common Substring

Variation 2: Shortest Common Supersequence

Key Takeaways

What's Next