webby.tools

Levenshtein Distance Calculator

Calculate the edit distance between two strings and see exactly which characters were inserted, deleted, or substituted to transform one into the other.

What is Levenshtein Distance?

Levenshtein distance — also called edit distance — is the minimum number of single-character operations needed to transform one string into another. It was proposed by Soviet mathematician Vladimir Levenshtein in 1965 and remains one of the most widely used string similarity metrics in computer science.

The three allowed operations are:

  • Insertion — add a character (e.g. catcats: insert s)
  • Deletion — remove a character (e.g. catscat: delete s)
  • Substitution — replace one character with another (e.g. catbat: substitute cb)

Example: kittensitting

Step Operation Result
1 Substitute ks sitten
2 Substitute ei sittin
3 Insert g at end sitting

Edit distance = 3. Similarity = \(1 - \frac{3}{7} \approx 57%\)

The Dynamic Programming Algorithm

The standard algorithm fills an \((n+1) \times (m+1)\) matrix where \(n\) and \(m\) are the lengths of the two strings. The recurrence is:

\[dp[i][j] = \begin{cases} \max(i,j) & \text{if } \min(i,j) = 0 \\ dp[i-1][j-1] & \text{if } s[i] = t[j] \\ 1 + \min(dp[i-1][j],\, dp[i][j-1],\, dp[i-1][j-1]) & \text{otherwise} \end{cases}\]

The three terms in the min represent deletion, insertion, and substitution respectively. Enable "Show DP matrix" above to see the full matrix for your inputs.

Time complexity is \(O(nm)\) and space complexity is \(O(nm)\), though space can be reduced to \(O(\min(n,m))\) with a rolling array.

Real-World Applications

Spell checking — When a user types a word not found in a dictionary, a spell checker finds words within a small edit distance (typically 1–2) and ranks them as suggestions. teh has distance 1 from the, making it the top suggestion.

DNA sequence alignment — Bioinformatics tools use edit distance variants to identify mutations between genetic sequences. A substitution corresponds to a point mutation; an insertion or deletion is an indel.

Fuzzy string search — Search engines and autocomplete systems use edit distance to return results even when queries contain typos. Searching for javascrpit still surfaces JavaScript results because the distance is only 2.

Plagiarism detection — Comparing document sections using edit distance or derived metrics helps identify paraphrased or slightly modified copied text.

Record linkage / deduplication — Matching records across databases where names may be spelled differently (Jon Smith vs John Smyth) uses edit distance to identify likely duplicates.

Version control diffs — Tools like diff use edit-distance-based algorithms to compute the minimum set of line insertions and deletions that transform one file into another.

Frequently Asked Questions

What's the difference between Levenshtein distance and Hamming distance?
Hamming distance only counts substitutions and only works on strings of equal length. Levenshtein distance handles strings of any length and counts insertions, deletions, and substitutions. Levenshtein is strictly more general.

What does a distance of 0 mean?
The strings are identical.

How is similarity percentage calculated here?
Similarity is \(\left(1 - \frac{d}{\max(|A|, |B|)}\right) \times 100%\) where \(d\) is the edit distance. This normalizes the distance by the length of the longer string, giving a value between 0% (completely different) and 100% (identical).

Is Levenshtein distance case-sensitive?
Yes — this calculator treats uppercase and lowercase as distinct characters, so Cat and cat have a distance of 1. Convert both strings to the same case before comparing if you want case-insensitive results.

What are the limitations of edit distance?
Edit distance treats all characters equally, so it doesn't account for phonetic similarity, semantic meaning, or common keyboard-adjacency errors. For those use cases, algorithms like Jaro-Winkler (better for short names), Soundex (phonetic), or cosine similarity (semantic) may be more appropriate.

What is Damerau-Levenshtein distance?
A variant that adds a fourth operation: transposition of two adjacent characters. This means abba costs 1 instead of 2. It's useful for catching common typing errors like accidentally swapping two letters.

Icons from Creative Fabrica

This website may contain affiliate links. If you click on an affiliate link and make a purchase, we may receive a small commission at no additional cost to you.