Problem: Longest Common Subsequence
Our second 2D problem introduces a fundamental pattern: comparing two sequences.
Problem Statement
Given two strings text1 and text2, return the length of their longest common subsequence. If there is no common subsequence, return 0.
A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.
For example, "ace" is a subsequence of "abcde".
A common subsequence of two strings is a subsequence that is common to both strings.
Example 1:
Input: text1 = "abcde", text2 = "ace"
Output: 3
Explanation: The longest common subsequence is "ace" (length 3).
Example 2:
Input: text1 = "abc", text2 = "abc"
Output: 3
Explanation: The longest common subsequence is "abc" (length 3).
Example 3:
Input: text1 = "abc", text2 = "def"
Output: 0
Explanation: No common subsequence.
Understanding Subsequences
Subsequence: Preserve order, but can skip characters
text = "abcde"
Valid subsequences: "a", "b", "ab", "ac", "ace", "abde", "abcde", "", etc.
NOT valid subsequences: "ba" (wrong order), "aec" (wrong order)
Total subsequences of length n: 2^n (include or exclude each character)
Common subsequence: Present in both strings
Example:
- text1 = "abcde"
- text2 = "aec"
Common subsequences: "a", "e", "ae", "ac"
Longest: length 2 (e.g., "ae" or "ac")
Why This is Hard (Brute Force)
Naive approach: Generate all subsequences of text1, check each against text2
Complexity: O(2^n × m) where n = length of text1, m = length of text2
For n=20: Over 1 million subsequences to check!
DP gives us O(n × m): Much better!
Step 1: Define the State
What should dp[i][j] represent?
Definition: dp[i][j] = length of LCS of text1[0...i-1] and text2[0...j-1]
In other words: LCS of the first i characters of text1 and the first j characters of text2.
Why this works: Answer is dp[n][m] where n and m are the lengths of text1 and text2.
Alternative interpretation: Using 0-indexed thinking:
dp[0][j]= LCS of empty string and text2[0...j-1]dp[i][0]= LCS of text1[0...i-1] and empty stringdp[i][j]= LCS of text1[0...i-1] and text2[0...j-1]
Step 2: Identify Base Cases
What are the simplest subproblems?
dp[0][j] for all j: LCS of empty string and text2[0...j-1]
- No characters in text1, so no common subsequence
dp[0][j] = 0for all j
dp[i][0] for all i: LCS of text1[0...i-1] and empty string
- No characters in text2, so no common subsequence
dp[i][0] = 0for all i
Base case summary:
"" a c e
"" 0 0 0 0
a 0 ? ? ?
b 0 ? ? ?
c 0 ? ? ?
d 0 ? ? ?
e 0 ? ? ?
Entire first row and first column are 0.
Step 3: Find the Recurrence Relation
For dp[i][j], we're considering text1[i-1] and text2[j-1] (the i-th and j-th characters, using 0-indexed strings).
Two cases:
Case 1: Characters Match
If text1[i-1] == text2[j-1]:
This character is part of the LCS!
LCS length: 1 (for this character) + LCS of the strings before these characters
dp[i][j] = 1 + dp[i-1][j-1]
Example:
- text1 = "abc", text2 = "ac"
- At i=3, j=2: text1[2]='c', text2[1]='c' → Match!
- dp[3][2] = 1 + dp[2][1]
Case 2: Characters Don't Match
If text1[i-1] != text2[j-1]:
This character from one of the strings is NOT in the LCS.
Two options:
- Exclude text1[i-1]: LCS is dp[i-1][j]
- Exclude text2[j-1]: LCS is dp[i][j-1]
Take the better (longer) option:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
Example:
- text1 = "abc", text2 = "ac"
- At i=2, j=2: text1[1]='b', text2[1]='c' → No match
- dp[2][2] = max(dp[1][2], dp[2][1])
Complete Recurrence
if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
dp[i][j] = 1 + dp[i - 1][j - 1];
} else {
dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
}
Step 4: Implementation
public class LongestCommonSubsequence {
public int longestCommonSubsequence(String text1, String text2) {
int m = text1.length();
int n = text2.length();
// Create DP table
// dp[i][j] = LCS length of text1[0...i-1] and text2[0...j-1]
int[][] dp = new int[m + 1][n + 1];
// Base cases: first row and column are already 0 (default)
// Fill table
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
// Characters match: include in LCS
dp[i][j] = 1 + dp[i - 1][j - 1];
} else {
// Characters don't match: take best of excluding one
dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
}
}
}
// Return answer
return dp[m][n];
}
}
Step 5: Trace Through an Example
Let's trace longestCommonSubsequence("abcde", "ace"):
text1 = "abcde" (length 5)
text2 = "ace" (length 3)
Initialize:
"" a c e
"" 0 0 0 0
a 0 ? ? ?
b 0 ? ? ?
c 0 ? ? ?
d 0 ? ? ?
e 0 ? ? ?
Fill table:
i=1, j=1: text1[0]='a', text2[0]='a' → Match!
dp[1][1] = 1 + dp[0][0] = 1 + 0 = 1
i=1, j=2: text1[0]='a', text2[1]='c' → No match
dp[1][2] = max(dp[0][2], dp[1][1]) = max(0, 1) = 1
i=1, j=3: text1[0]='a', text2[2]='e' → No match
dp[1][3] = max(dp[0][3], dp[1][2]) = max(0, 1) = 1
"" a c e
"" 0 0 0 0
a 0 1 1 1
b 0 ? ? ?
c 0 ? ? ?
d 0 ? ? ?
e 0 ? ? ?
i=2, j=1: text1[1]='b', text2[0]='a' → No match
dp[2][1] = max(dp[1][1], dp[2][0]) = max(1, 0) = 1
i=2, j=2: text1[1]='b', text2[1]='c' → No match
dp[2][2] = max(dp[1][2], dp[2][1]) = max(1, 1) = 1
i=2, j=3: text1[1]='b', text2[2]='e' → No match
dp[2][3] = max(dp[1][3], dp[2][2]) = max(1, 1) = 1
"" a c e
"" 0 0 0 0
a 0 1 1 1
b 0 1 1 1
c 0 ? ? ?
d 0 ? ? ?
e 0 ? ? ?
i=3, j=1: text1[2]='c', text2[0]='a' → No match
dp[3][1] = max(dp[2][1], dp[3][0]) = max(1, 0) = 1
i=3, j=2: text1[2]='c', text2[1]='c' → Match!
dp[3][2] = 1 + dp[2][1] = 1 + 1 = 2
i=3, j=3: text1[2]='c', text2[2]='e' → No match
dp[3][3] = max(dp[2][3], dp[3][2]) = max(1, 2) = 2
"" a c e
"" 0 0 0 0
a 0 1 1 1
b 0 1 1 1
c 0 1 2 2
d 0 ? ? ?
e 0 ? ? ?
i=4, j=1: text1[3]='d', text2[0]='a' → No match
dp[4][1] = max(dp[3][1], dp[4][0]) = max(1, 0) = 1
i=4, j=2: text1[3]='d', text2[1]='c' → No match
dp[4][2] = max(dp[3][2], dp[4][1]) = max(2, 1) = 2
i=4, j=3: text1[3]='d', text2[2]='e' → No match
dp[4][3] = max(dp[3][3], dp[4][2]) = max(2, 2) = 2
"" a c e
"" 0 0 0 0
a 0 1 1 1
b 0 1 1 1
c 0 1 2 2
d 0 1 2 2
e 0 ? ? ?
i=5, j=1: text1[4]='e', text2[0]='a' → No match
dp[5][1] = max(dp[4][1], dp[5][0]) = max(1, 0) = 1
i=5, j=2: text1[4]='e', text2[1]='c' → No match
dp[5][2] = max(dp[4][2], dp[5][1]) = max(2, 1) = 2
i=5, j=3: text1[4]='e', text2[2]='e' → Match!
dp[5][3] = 1 + dp[4][2] = 1 + 2 = 3
Final table:
"" a c e
"" 0 0 0 0
a 0 1 1 1
b 0 1 1 1
c 0 1 2 2
d 0 1 2 2
e 0 1 2 3
Return dp[5][3] = 3 ✓
The LCS has length 3, which is "ace"!
Reconstructing the LCS
The DP table tells us the length, but what is the actual LCS?
Backtracking approach:
Start at dp[m][n] and work backwards:
public String getLCS(String text1, String text2, int[][] dp) {
StringBuilder lcs = new StringBuilder();
int i = text1.length();
int j = text2.length();
while (i > 0 && j > 0) {
if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
// This character is in the LCS
lcs.append(text1.charAt(i - 1));
i--;
j--;
} else if (dp[i - 1][j] > dp[i][j - 1]) {
// LCS came from above
i--;
} else {
// LCS came from left
j--;
}
}
// Reverse because we built it backwards
return lcs.reverse().toString();
}
Trace for our example:
Start: i=5, j=3, dp[5][3]=3
text1[4]='e', text2[2]='e' → Match!
lcs = "e", move to i=4, j=2
text1[3]='d', text2[1]='c' → No match
dp[3][2]=2 > dp[4][1]=1 → Move up (i=3, j=2)
text1[2]='c', text2[1]='c' → Match!
lcs = "ec", move to i=2, j=1
text1[1]='b', text2[0]='a' → No match
dp[1][1]=1 == dp[2][0]=0 → Move left (i=2, j=0) [could also move up]
Actually dp[1][1]=1 > dp[2][0]=0, so move up: i=1, j=1
text1[0]='a', text2[0]='a' → Match!
lcs = "eca", move to i=0, j=0
Reverse: "ace" ✓
Complexity Analysis
Time Complexity: O(m × n)
- Two nested loops: m × n iterations
- Each cell: O(1) work (comparison, max)
- Total: O(m × n)
Space Complexity: O(m × n)
- DP table of size (m+1) × (n+1)
- Total: O(m × n)
Space Optimization
Observation: To compute row i, we only need row i-1.
Optimized approach:
public int longestCommonSubsequence(String text1, String text2) {
int m = text1.length();
int n = text2.length();
// Use only one row
int[] dp = new int[n + 1];
for (int i = 1; i <= m; i++) {
int prev = 0; // Stores dp[i-1][j-1]
for (int j = 1; j <= n; j++) {
int temp = dp[j]; // Save dp[i-1][j] before overwriting
if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
dp[j] = 1 + prev;
} else {
dp[j] = Math.max(dp[j], dp[j - 1]);
}
prev = temp; // For next iteration, this is dp[i-1][j-1]
}
}
return dp[n];
}
Space complexity: O(n) instead of O(m × n)!
Note: This is trickier than Unique Paths because we need the diagonal value (dp[i-1][j-1]), not just above and left. We use a prev variable to store it.
Real-World Applications
LCS is the foundation for:
-
Diff tools (git diff, file comparison)
- Find common lines, highlight changes
-
DNA sequence alignment
- Find similar regions in genetic sequences
-
Spell checkers
- Compute similarity between words
-
Plagiarism detection
- Find common text segments
-
Version control
- Merge different file versions
Variations
Variation 1: Longest Common Substring
Different from subsequence: Must be contiguous!
Example:
- text1 = "abcde"
- text2 = "abfce"
LCS (subsequence): "abce" (length 4)
Longest common substring: "ab" (length 2)
DP modification:
if (text1.charAt(i - 1) == text2.charAt(j - 1)) {
dp[i][j] = 1 + dp[i - 1][j - 1];
maxLen = Math.max(maxLen, dp[i][j]);
} else {
dp[i][j] = 0; // Reset for substring!
}
Variation 2: Shortest Common Supersequence
Problem: Find shortest string that has both text1 and text2 as subsequences.
Solution: Length = m + n - LCS_length
Why? Shared characters (LCS) count once, others twice.
Key Takeaways
LCS demonstrates:
- Two-sequence comparison: State tracks positions in both strings
- Conditional recurrence: Different cases based on character match
- Backtracking: Reconstruct solution from DP table
- Practical importance: Foundation for many real-world algorithms
Pattern Recognition:
This is a sequence comparison problem where:
- State is (position in text1, position in text2)
- Recurrence branches on character match
- Base cases are empty sequences
This pattern appears in:
- Edit Distance (next problem!)
- Longest Palindromic Subsequence
- Minimum ASCII Delete Sum
Sequence comparison is one of the most important 2D DP patterns.
What's Next
Our next problem, Edit Distance, extends the LCS idea with weighted operations. Instead of just finding common characters, we'll compute the minimum cost to transform one string into another.