Problem: Edit Distance
Our third 2D problem introduces weighted operations: transforming one sequence into another with minimum cost.
Problem Statement
Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.
You have the following three operations permitted on a word:
- Insert a character
- Delete a character
- Replace a character
Example 1:
Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation:
horse → rorse (replace 'h' with 'r')
rorse → rose (delete 'r')
rose → ros (delete 'e')
Example 2:
Input: word1 = "intention", word2 = "execution"
Output: 5
Explanation:
intention → inention (delete 't')
inention → enention (replace 'i' with 'e')
enention → exention (replace 'n' with 'x')
exention → exection (replace 'n' with 'c')
exection → execution (insert 'u')
Example 3:
Input: word1 = "abc", word2 = "abc"
Output: 0
Explanation: Words are identical, no operations needed.
Understanding Edit Operations
Three allowed operations:
1. Insert
word1 = "abc", want to get "abcd"
Operation: Insert 'd' at the end
Cost: 1 operation
2. Delete
word1 = "abcd", want to get "abc"
Operation: Delete 'd'
Cost: 1 operation
3. Replace
word1 = "abc", want to get "abd"
Operation: Replace 'c' with 'd'
Cost: 1 operation
Goal: Find the minimum number of operations to transform word1 into word2.
Why This is Hard (Brute Force)
Many possible sequences of operations!
For "horse" → "ros":
- Replace h→r, delete o, delete r, delete s, delete e? (5 ops)
- Delete h, delete o, delete r, delete s, delete e, insert r, insert o, insert s? (8 ops)
- Replace h→r, delete r, delete e? (3 ops) ✓ Best!
Exponential possibilities, DP finds optimal in O(m × n).
Step 1: Define the State
What should dp[i][j] represent?
Definition: dp[i][j] = minimum edit distance to convert word1[0...i-1] to word2[0...j-1]
In other words: Minimum operations to convert the first i characters of word1 into the first j characters of word2.
Answer: dp[m][n] where m = word1.length, n = word2.length
Step 2: Identify Base Cases
What are the simplest subproblems?
dp[0][j]: Convert Empty String to word2[0...j-1]
Only option: Insert all j characters from word2
Operations: j insertions
Base case: dp[0][j] = j for all j
Example: "" → "ros" requires 3 insertions
dp[i][0]: Convert word1[0...i-1] to Empty String
Only option: Delete all i characters from word1
Operations: i deletions
Base case: dp[i][0] = i for all i
Example: "horse" → "" requires 5 deletions
Base Case Summary
"" r o s
"" 0 1 2 3
h 1 ? ? ?
o 2 ? ? ?
r 3 ? ? ?
s 4 ? ? ?
e 5 ? ? ?
First row: 0, 1, 2, 3 (insert each character of word2) First column: 0, 1, 2, 3, 4, 5 (delete each character of word1)
Step 3: Find the Recurrence Relation
For dp[i][j], we're comparing word1[i-1] and word2[j-1].
Case 1: Characters Match
If word1[i-1] == word2[j-1]:
No operation needed for this character! Just use the result from before.
dp[i][j] = dp[i-1][j-1]
Example:
- word1 = "abc", word2 = "abc"
- At i=3, j=3: word1[2]='c', word2[2]='c' → Match!
- dp[3][3] = dp[2][2] (no operation for 'c')
Case 2: Characters Don't Match
If word1[i-1] != word2[j-1]:
We must perform an operation. Three options:
Option 1: Replace
Replace word1[i-1] with word2[j-1].
After replacement: Both characters at positions i-1 and j-1 are now the same.
Remaining problem: Convert word1[0...i-2] to word2[0...j-2]
Cost: 1 + dp[i-1][j-1]
Example:
- "horse" → "ros"
- Replace 'h' with 'r': "rorse" → "ros"
- Now solve: "orse" → "os"
Option 2: Insert
Insert word2[j-1] into word1 at position i.
After insertion: word1 now has word2[j-1] at the right position.
Remaining problem: Convert word1[0...i-1] to word2[0...j-2]
Cost: 1 + dp[i][j-1]
Intuition: If we insert word2[j-1], we've "handled" position j in word2, so move j back.
Example:
- "horse" → "ros"
- Insert 's' at end: "horses" → "ros"
- Now solve: "horses" → "ro"
Option 3: Delete
Delete word1[i-1] from word1.
After deletion: word1 is shorter, position i-1 is gone.
Remaining problem: Convert word1[0...i-2] to word2[0...j-1]
Cost: 1 + dp[i-1][j]
Example:
- "horse" → "ros"
- Delete 'e': "hors" → "ros"
- Now solve: "hors" → "ros"
Choose the Minimum
dp[i][j] = 1 + min(dp[i-1][j-1], // Replace
dp[i][j-1], // Insert
dp[i-1][j]) // Delete
Complete Recurrence
if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
dp[i][j] = dp[i - 1][j - 1]; // No operation
} else {
int replace = dp[i - 1][j - 1];
int insert = dp[i][j - 1];
int delete = dp[i - 1][j];
dp[i][j] = 1 + Math.min(replace, Math.min(insert, delete));
}
Step 4: Implementation
public class EditDistance {
public int minDistance(String word1, String word2) {
int m = word1.length();
int n = word2.length();
// Create DP table
// dp[i][j] = edit distance between word1[0...i-1] and word2[0...j-1]
int[][] dp = new int[m + 1][n + 1];
// Base cases
// Convert empty string to word2[0...j-1]: j insertions
for (int j = 0; j <= n; j++) {
dp[0][j] = j;
}
// Convert word1[0...i-1] to empty string: i deletions
for (int i = 0; i <= m; i++) {
dp[i][0] = i;
}
// Fill table
for (int i = 1; i <= m; i++) {
for (int j = 1; j <= n; j++) {
if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
// Characters match: no operation needed
dp[i][j] = dp[i - 1][j - 1];
} else {
// Characters don't match: try all three operations
int replace = dp[i - 1][j - 1];
int insert = dp[i][j - 1];
int delete = dp[i - 1][j];
dp[i][j] = 1 + Math.min(replace, Math.min(insert, delete));
}
}
}
// Return answer
return dp[m][n];
}
}
Step 5: Trace Through an Example
Let's trace minDistance("horse", "ros"):
word1 = "horse" (length 5)
word2 = "ros" (length 3)
Initialize base cases:
"" r o s
"" 0 1 2 3
h 1 ? ? ?
o 2 ? ? ?
r 3 ? ? ?
s 4 ? ? ?
e 5 ? ? ?
Fill table:
i=1, j=1: word1[0]='h', word2[0]='r' → No match
replace = dp[0][0] = 0
insert = dp[1][0] = 1
delete = dp[0][1] = 1
dp[1][1] = 1 + min(0, 1, 1) = 1
i=1, j=2: word1[0]='h', word2[1]='o' → No match
replace = dp[0][1] = 1
insert = dp[1][1] = 1
delete = dp[0][2] = 2
dp[1][2] = 1 + min(1, 1, 2) = 2
i=1, j=3: word1[0]='h', word2[2]='s' → No match
replace = dp[0][2] = 2
insert = dp[1][2] = 2
delete = dp[0][3] = 3
dp[1][3] = 1 + min(2, 2, 3) = 3
"" r o s
"" 0 1 2 3
h 1 1 2 3
o 2 ? ? ?
r 3 ? ? ?
s 4 ? ? ?
e 5 ? ? ?
i=2, j=1: word1[1]='o', word2[0]='r' → No match
replace = dp[1][0] = 1
insert = dp[2][0] = 2
delete = dp[1][1] = 1
dp[2][1] = 1 + min(1, 2, 1) = 2
i=2, j=2: word1[1]='o', word2[1]='o' → Match!
dp[2][2] = dp[1][1] = 1
i=2, j=3: word1[1]='o', word2[2]='s' → No match
replace = dp[1][2] = 2
insert = dp[2][2] = 1
delete = dp[1][3] = 3
dp[2][3] = 1 + min(2, 1, 3) = 2
"" r o s
"" 0 1 2 3
h 1 1 2 3
o 2 2 1 2
r 3 ? ? ?
s 4 ? ? ?
e 5 ? ? ?
i=3, j=1: word1[2]='r', word2[0]='r' → Match!
dp[3][1] = dp[2][0] = 2
i=3, j=2: word1[2]='r', word2[1]='o' → No match
replace = dp[2][1] = 2
insert = dp[3][1] = 2
delete = dp[2][2] = 1
dp[3][2] = 1 + min(2, 2, 1) = 2
i=3, j=3: word1[2]='r', word2[2]='s' → No match
replace = dp[2][2] = 1
insert = dp[3][2] = 2
delete = dp[2][3] = 2
dp[3][3] = 1 + min(1, 2, 2) = 2
"" r o s
"" 0 1 2 3
h 1 1 2 3
o 2 2 1 2
r 3 2 2 2
s 4 ? ? ?
e 5 ? ? ?
i=4, j=1: word1[3]='s', word2[0]='r' → No match
replace = dp[3][0] = 3
insert = dp[4][0] = 4
delete = dp[3][1] = 2
dp[4][1] = 1 + min(3, 4, 2) = 3
i=4, j=2: word1[3]='s', word2[1]='o' → No match
replace = dp[3][1] = 2
insert = dp[4][1] = 3
delete = dp[3][2] = 2
dp[4][2] = 1 + min(2, 3, 2) = 3
i=4, j=3: word1[3]='s', word2[2]='s' → Match!
dp[4][3] = dp[3][2] = 2
"" r o s
"" 0 1 2 3
h 1 1 2 3
o 2 2 1 2
r 3 2 2 2
s 4 3 3 2
e 5 ? ? ?
i=5, j=1: word1[4]='e', word2[0]='r' → No match
replace = dp[4][0] = 4
insert = dp[5][0] = 5
delete = dp[4][1] = 3
dp[5][1] = 1 + min(4, 5, 3) = 4
i=5, j=2: word1[4]='e', word2[1]='o' → No match
replace = dp[4][1] = 3
insert = dp[5][1] = 4
delete = dp[4][2] = 3
dp[5][2] = 1 + min(3, 4, 3) = 4
i=5, j=3: word1[4]='e', word2[2]='s' → No match
replace = dp[4][2] = 3
insert = dp[5][2] = 4
delete = dp[4][3] = 2
dp[5][3] = 1 + min(3, 4, 2) = 3
Final table:
"" r o s
"" 0 1 2 3
h 1 1 2 3
o 2 2 1 2
r 3 2 2 2
s 4 3 3 2
e 5 4 4 3
Return dp[5][3] = 3 ✓
Minimum edit distance is 3!
Reconstructing the Operations
Similar to LCS backtracking:
public void printOperations(String word1, String word2, int[][] dp) {
int i = word1.length();
int j = word2.length();
while (i > 0 && j > 0) {
if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
// Characters match, no operation
i--;
j--;
} else {
int replace = dp[i - 1][j - 1];
int insert = dp[i][j - 1];
int delete = dp[i - 1][j];
if (replace <= insert && replace <= delete) {
System.out.println("Replace " + word1.charAt(i - 1) + " with " + word2.charAt(j - 1));
i--;
j--;
} else if (delete <= insert) {
System.out.println("Delete " + word1.charAt(i - 1));
i--;
} else {
System.out.println("Insert " + word2.charAt(j - 1));
j--;
}
}
}
// Handle remaining characters
while (i > 0) {
System.out.println("Delete " + word1.charAt(i - 1));
i--;
}
while (j > 0) {
System.out.println("Insert " + word2.charAt(j - 1));
j--;
}
}
For "horse" → "ros":
Delete 'e'
Delete 's'
Delete 'r'
Wait, that's 3 deletes but not optimal path shown in example...
Let me trace backwards from dp[5][3]=3:
i=5, j=3: word1[4]='e', word2[2]='s'
dp[4][3]=2, dp[5][2]=4, dp[4][2]=3
Delete wins (dp[4][3]=2), so delete 'e'
Move to i=4, j=3
i=4, j=3: word1[3]='s', word2[2]='s' → Match
Move to i=3, j=2
i=3, j=2: word1[2]='r', word2[1]='o'
dp[2][1]=2, dp[3][1]=2, dp[2][2]=1
Delete wins (dp[2][2]=1), so delete 'r'
Move to i=2, j=2
i=2, j=2: word1[1]='o', word2[1]='o' → Match
Move to i=1, j=1
i=1, j=1: word1[0]='h', word2[0]='r'
dp[0][0]=0, dp[1][0]=1, dp[0][1]=1
Replace wins (dp[0][0]=0), so replace 'h' with 'r'
Move to i=0, j=0
Operations (in reverse order):
1. Replace 'h' with 'r': "horse" → "rorse"
2. Delete 'r': "rorse" → "rose"
3. Delete 'e': "rose" → "ros"
Matches the example! ✓
Complexity Analysis
Time Complexity: O(m × n)
- Two nested loops
- Each cell: O(1) work
- Total: O(m × n)
Space Complexity: O(m × n)
- DP table of size (m+1) × (n+1)
- Total: O(m × n)
Space Optimization
Same as LCS: We only need the previous row.
public int minDistance(String word1, String word2) {
int m = word1.length();
int n = word2.length();
int[] dp = new int[n + 1];
// Initialize first row
for (int j = 0; j <= n; j++) {
dp[j] = j;
}
for (int i = 1; i <= m; i++) {
int prev = dp[0]; // Saves dp[i-1][j-1]
dp[0] = i; // First column: i deletions
for (int j = 1; j <= n; j++) {
int temp = dp[j]; // Save dp[i-1][j]
if (word1.charAt(i - 1) == word2.charAt(j - 1)) {
dp[j] = prev;
} else {
dp[j] = 1 + Math.min(prev, Math.min(dp[j], dp[j - 1]));
}
prev = temp;
}
}
return dp[n];
}
Space complexity: O(n)
Real-World Applications
Edit distance (Levenshtein distance) is used for:
- Spell checking: Find closest correctly-spelled word
- DNA sequence alignment: Compare genetic sequences
- Plagiarism detection: Measure text similarity
- Speech recognition: Match spoken words to dictionary
- Autocorrect: Suggest corrections for typos
- Machine translation: Evaluate translation quality
Example: Spell check
User types: "recieve"
Distances to dictionary words (Levenshtein: insert/delete/replace):
- "receive": 2 (two edits; transposition isn't an allowed single operation)
- "relieve": 1 (substitute 'c'→'l')
- "believe": 2 (substitute 'r'→'b' and 'c'→'l')
Suggestion: "relieve" (smallest edit distance)
Variations
Variation 1: Weighted Operations
Different costs for different operations:
- Insert: cost 1
- Delete: cost 1
- Replace: cost 2
Modification:
dp[i][j] = Math.min(dp[i-1][j-1] + replaceCost,
dp[i][j-1] + insertCost,
dp[i-1][j] + deleteCost);
Variation 2: Delete Distance
Only delete operations allowed (no insert, no replace).
Problem: Minimum deletions from both strings to make them equal.
Solution: Use LCS!
Delete distance = (m - LCS) + (n - LCS) = m + n - 2 * LCS
Key Takeaways
Edit Distance demonstrates:
- Weighted operations: Each choice has a cost
- Three-way recurrence: More complex than two-way
- Practical importance: Real-world applications in many fields
Pattern Recognition:
This is a sequence transformation problem where:
- State is (position in source, position in target)
- Recurrence considers multiple operations
- Base cases are converting to/from empty string
This pattern appears in:
- Minimum ASCII Delete Sum
- Wildcard Matching
- Regular Expression Matching
Edit Distance is one of the most important DP problems due to its wide applicability.
What's Next
Our final 2D problem, 0/1 Knapsack, introduces a different pattern: selection with constraints. Instead of comparing sequences, we'll select items to maximize value under a weight constraint.