Question 1

What is the difference between Levenshtein and Jaro-Winkler?

Accepted Answer

Levenshtein counts the minimum single-character edits (insert, delete, substitute) between two strings and weights every edit equally. Jaro-Winkler measures the share of matching characters and transpositions, then adds a bonus for a shared prefix. Levenshtein is strong for typos and truncation; Jaro-Winkler is strong for short strings like brand names, SKUs, and part numbers where the leading characters matter most.

Question 2

What is a good string similarity threshold for product matching?

Accepted Answer

There is no universal number — it depends on your data and your tolerance for false matches. Many teams treat normalized scores above roughly 0.90 as auto-match candidates, 0.80–0.90 as review, and below 0.80 as no-match, then tune from a labeled sample. Always validate against known pairs rather than picking a cutoff blind, and pair string scores with identifier or attribute checks before merging.

Question 3

Can I use this to deduplicate a whole catalog?

Accepted Answer

This calculator is for comparing strings and calibrating thresholds, not for running a full catalog at scale. Deduplication needs blocking (to avoid comparing every row to every other row), multiple weighted fields, and reversible merges with provenance. See the playbook on matching supplier catalogs and the guide on why fuzzy-match scripts break for the production picture.

Question 4

Is my data uploaded anywhere?

Accepted Answer

No. The scoring runs entirely in your browser using client-side JavaScript. The strings you paste or the file you load are never transmitted to a server, stored, or logged, so you can safely test with real supplier or customer data.

Question 5

Why do two algorithms sometimes disagree?

Accepted Answer

Because they model similarity differently. A reordered or abbreviated description can score low on Levenshtein (many edits) but high on Jaro-Winkler (shared prefix, similar characters), or vice versa. When the two disagree sharply, treat the pair as a review case and lean on a stronger key such as a GTIN or MPN instead of free-text matching.

Levenshtein / Jaro-Winkler String Similarity

What it checks

How it works

Why a string similarity calculator is only step one

What Is Fuzzy Matching?

Deterministic vs Probabilistic Matching

Fuzzy Match Score Calculator

Match Supplier Catalogs to Your Inventory

Why Fuzzy-Match Scripts Break at Scale

FAQ