Fuzzy Match Score Calculator

Free fuzzy match score calculator: compare two product strings and get Levenshtein, Jaro-Winkler, and token similarity scores in your browser.

published catalog-matching

This fuzzy match score calculator compares two product strings — names, descriptions, manufacturer part numbers, or supplier SKUs — and returns a similarity score so you can decide whether they refer to the same item. Paste a pair or a column of pairs and see how closely they match before you merge, cross-reference, or onboard them.

Fuzzy Match Score Calculator

The interactive version of this tool is coming soon. It will run entirely in your browser — no login, no upload limits.

Planned tool: fuzzy match score calculator

Need this now? Talk to Claro

What it checks

For each pair of strings you enter, the calculator computes and reports:

  • Levenshtein (edit distance) similarity — how many single-character insertions, deletions, or substitutions separate the two strings, normalized to a 0–100 score. Good for catching typos like Loctite 243 vs Loctite 234.
  • Jaro-Winkler similarity — a 0–1 score that rewards matches at the start of a string, which suits part numbers and brand-led titles such as SKF 6204-2RS vs SKF 6204 2RS.
  • Token (set/sort) similarity — splits each string into words and compares the sets, so reordering and extra words score well. Useful when a CPG title reads Organic Almond Butter 16oz on one feed and Almond Butter, Organic — 16 oz on another.
  • Normalized comparison — an optional pass that lowercases, strips punctuation, and collapses whitespace before scoring, so cosmetic formatting differences do not drag the score down.
  • A blended verdict — a plain-language likely match / review / no match label based on the combined scores, with the contributing metric shown so you understand why.

How fuzzy match scoring works

Fuzzy matching estimates how similar two pieces of text are when an exact, character-for-character match will not work. Catalog data is full of near-misses: a furniture supplier writes Walnut Veneer Side Table - 45cm, your ERP stores Side Table, Walnut Veneer, 450mm, and a marketplace feed lists WLNT-SIDE-TBL-45. None of these match on a string equality check, yet all describe one product.

Each algorithm measures similarity differently. Edit-distance methods count the character operations needed to turn one string into the other. Jaro-Winkler weighs matching and transposed characters and gives a bonus for a shared prefix. Token methods ignore word order and focus on shared vocabulary. Because each is strong on different error types — typos, abbreviations, reordering, added units — this calculator surfaces all of them so you can pick a threshold that fits your data rather than trusting a single number.

A score is a signal, not a decision. The right cutoff depends on your tolerance for false merges. For MRO and industrial distribution, where merging two distinct fasteners is costly, teams often hold a high bar and route borderline pairs to human review. For loosely structured CPG titles, a lower token-similarity threshold may be acceptable. The guidance below explains how to set those thresholds and why naive scripts struggle once volume grows.

FAQ

What is a good fuzzy match score?

There is no universal cutoff — it depends on your data and your cost of error. As a starting point, blended similarity above roughly 90 usually indicates the same product, 75–90 warrants human review, and below 75 is likely a different item. For high-risk catalogs like industrial parts, raise the auto-merge bar and review more pairs manually.

Which is better, Levenshtein or Jaro-Winkler?

Neither is universally better. Levenshtein (edit distance) is intuitive for typos and short differences. Jaro-Winkler favors strings that share a prefix, which helps with part numbers and brand-led titles. This calculator shows both plus a token score so you can choose the metric that best fits the kind of variation in your fields.

Can I use this to match SKUs or part numbers?

Yes. Paste the two identifiers and read the Jaro-Winkler and normalized scores, which handle spacing and punctuation differences like 6204-2RS vs 6204 2RS well. For structured identifiers, also confirm with a deterministic check on the cleaned value, since a high fuzzy score alone can pair similar-but-distinct part numbers.

Is my data sent anywhere when I use this tool?

No. The fuzzy match score calculator runs entirely in your browser. Nothing is uploaded, stored, or transmitted, so you can safely test it with real supplier SKUs, pricing files, or product descriptions.

Why do my fuzzy-match scripts work in testing but fail in production?

Small samples hide the long tail of edge cases — multilingual titles, embedded units, transposed tokens, and near-duplicate part numbers — and pairwise comparison gets quadratically slower as catalogs grow. The guide on why fuzzy-match scripts break at scale covers blocking, normalization, and confidence thresholds that keep accuracy stable as volume increases.