What Is Schema Mapping?
Schema mapping aligns supplier fields to your target catalog schema. Learn how it works, why it breaks, and how to keep it clean as feeds change.
Every supplier delivers data in its own structure. Column names clash, units differ, and what one feed calls DESC_LONG another calls product_description_full. Schema mapping is the translation layer that reconciles those differences — and the first thing that breaks when a supplier quietly changes their export format.
Definition
Schema mapping is the process of defining how each field in a source dataset corresponds to the matching field in a target schema, so that data can move from one system to another without losing meaning.
The short answer to “what is schema mapping” is this: it is the set of rules that say “the column called desc in this supplier file becomes the long_description attribute in our catalog, and wt_kg becomes weight with unit kg.” Every time product data crosses a boundary — from a supplier spreadsheet into your PIM, from a legacy database into a new platform, from your catalog into a marketplace feed — the two sides almost never use the same field names, the same structure, or the same conventions. Schema mapping is the translation layer that reconciles them.
A complete mapping covers more than renaming columns. It handles structural differences (one system stores dimensions as a single size string, another splits them into length, width, and height), data-type differences (a price stored as text versus a decimal), value transformations (converting "Y"/"N" into a boolean, or normalizing units), and cardinality (a product with many images flattened into image_1, image_2, image_3). Strong mappings are documented, versioned, and re-runnable, because suppliers change their export formats and target schemas evolve.
A mapping that exists only in someone’s head, or buried in a one-off script, becomes a liability the moment that person leaves or the source file shifts. Claro treats schema mapping as a managed, versioned layer — every rule is recorded with provenance so you can see which source field produced which catalog value, audit changes, and roll back bad transforms before they corrupt downstream records.
Why schema mapping matters for product data
Schema mapping is the unglamorous step that determines whether everything downstream works. If you map a supplier’s pack_qty to your units_per_case field incorrectly, every duplicate-detection pass, every enrichment rule, and every price calculation inherits that error.
Consider how it plays out across industries. An MRO distributor ingesting a fastener supplier’s file has to map thread and len_mm into the canonical attributes their matching engine expects — otherwise two listings for the same M8 bolt look like different products and never deduplicate. A CPG brand syndicating to a grocery retailer maps internal attributes to the retailer’s required schema; a single mis-mapped net-content field gets the item rejected at the data pool. A furniture retailer importing a manufacturer catalog maps free-text material values into a controlled vocabulary so that “oak / solid oak / Oak veneer” collapse into consistent, filterable facets. In each case the mapping is what makes matching, deduplication, and enrichment possible rather than garbage-in, garbage-out.
Schema mapping also underpins AI search and generative engine optimization. Large language models and shopping assistants answer questions like “waterproof outdoor LED fixture under 50W” by reading structured attributes. If your IP rating lives in a field the model never sees because it was mapped into a generic notes blob, your product is effectively invisible to AI answers even when it is the perfect match. Mapping source data into clean, typed, well-named attributes is a prerequisite for being citable.
Claro operates exactly at this layer: it resolves supplier identity, maps and normalizes attributes against your canonical schema, validates every incoming record, and writes clean data back into your PIM or ERP with full provenance — so every value is traceable to its original source field and transform rule.
Before and after: messy feed vs. trusted catalog
| Before schema mapping | After schema mapping with Claro |
|---|---|
| Supplier sends DESC_LONG; PIM expects long_description — values land in the wrong field or are dropped | Rule maps DESC_LONG → long_description; trim and strip-HTML transform runs automatically |
| Weight arrives as '2.5 kg' text string; price calculations fail on non-numeric input | Split transform extracts value 2.5 and unit kg; weight_uom is populated consistently |
| Category field contains free-text 'fasteners / bolts'; taxonomy lookup never matches | CAT value is looked up against your taxonomy and replaced with canonical category_id |
| Availability flag is Y/N; downstream system expects boolean; 20% of records flagged as errors on import | Y/N normalized to in_stock/out_of_stock at the mapping layer; zero import errors |
| Mapping lives in a colleague's spreadsheet; breaks silently when supplier renames a column | Mapping is versioned and monitored; schema drift triggers an alert before bad data reaches the catalog |
Field-level mapping examples
| Source field (supplier) | Target attribute (your schema) | Transformation |
|---|---|---|
| DESC_LONG | long_description | Trim whitespace, strip HTML |
| WT | weight + weight_uom | Split value and unit; convert to kg |
| CAT | category_id | Lookup against taxonomy |
| INSTOCK | availability | Map Y/N to in_stock/out_of_stock |
| DIM | length + width + height | Parse 'LxWxH mm' string into three numeric fields |
| IMG_1, IMG_2 | images[] | Collect indexed columns into ordered array |
Where schema mapping fits in the data pipeline
Schema mapping does not stand alone. It is one step in a sequence that starts with raw supplier data and ends with a clean, enriched, deduplicated catalog record.
- Extract and profile the source
Pull the supplier file and profile it: what columns exist, what data types, what sample values, how complete is each field? This surfaces mapping decisions before you write a single rule.
- Draft and validate the mapping
Define field-to-field rules and value transforms. Run a sample through the mapping and inspect the output against your target schema. Flag low-confidence or ambiguous rules for human review.
- Resolve identity
Once fields are mapped into comparable attributes, entity resolution can match the incoming records to existing catalog entries. Clean fields are what make matching reliable.
- Enrich and normalize
Fill missing attributes, normalize values, and apply data normalization rules. Enrichment is only trustworthy when it operates on correctly mapped inputs.
- Write back with provenance
Load the mapped, resolved, enriched records back into your PIM or ERP. Claro attaches source provenance to every written value so downstream teams can audit and trust what they receive.
- Monitor for schema drift
Version the mapping and validate every new supplier drop against the expected structure. When a supplier renames a column or changes a unit, schema drift detection fires an alert before the error propagates.
Related
Glossary
What Is Schema Drift?
When source or target fields change over time and silently break a previously working mapping.
Glossary
What Is Data Normalization?
Standardizing values and units after they have been mapped into the target fields.
Glossary
What Is a PIM?
The system that usually holds your target schema and consumes mapped supplier data.
Playbook
Map Supplier Attributes to Your Schema
A step-by-step workflow for building and validating a mapping during supplier onboarding.
Tool
Taxonomy Mapper
Map category values across ETIM, UNSPSC, and Google product taxonomies.
Guide
Why Supplier Onboarding Takes Weeks
How manual schema mapping slows onboarding and what to do about it.
FAQ
What is the difference between schema mapping and data mapping?
The terms are often used interchangeably. Data mapping is the broader umbrella for any rules that move and transform data between systems. Schema mapping specifically emphasizes aligning the structure — fields, types, and relationships — of a source schema to a target schema. In product-data work you will hear both, and they usually refer to the same field-to-field translation activity.
Is schema mapping the same as ETL?
No, but it is a part of ETL. ETL (extract, transform, load) is the end-to-end pipeline that pulls data out of a source, reshapes it, and loads it into a destination. Schema mapping is the specification that drives the transform step: it defines which source field becomes which target field and how values are converted. ETL is the machinery; the mapping is the blueprint it follows.
How do you handle a source field that has no matching target field?
You have three common options: add a new attribute to the target schema, route the value into a flexible attribute such as a key-value or additional_attributes structure, or deliberately drop it and record that decision. Avoid the tempting fourth option of dumping unmapped values into a generic notes field, because unstructured data there is hard to match, filter, or expose to AI search later.
Why do schema mappings break?
Mappings break when either side changes without warning — a supplier renames a column, adds a new format, or starts sending dimensions in inches instead of millimetres, or your own target schema gains a required field. This is called schema drift. Versioning your mappings and validating incoming files against the expected structure catches these breaks before bad data reaches your catalog.
Can schema mapping be automated?
Partially. AI and similarity techniques can suggest mappings by comparing field names, sample values, and patterns, which dramatically speeds up onboarding a new supplier. But high-stakes or ambiguous fields still benefit from human review, and the safest setups keep a confidence score on each suggested mapping and route low-confidence ones to a person. Automation accelerates the work; it does not remove the need for accountability and provenance.
Claro
See how Claro handles this in production
This concept is one piece of keeping a catalog trusted. See how Claro resolves identity, enriches missing attributes, and validates every update before it reaches your PIM or ERP.
Learn more