What Is a Data Pool? GS1, GDSN, and Synchronized Product Data

A data pool is a GS1-certified repository that syncs product records between trading partners over GDSN — but most catalogs hold far more than pools cover.

published onboardingretail-marketplaces

Supplier onboarding teams know the scenario: a new trading partner sends their item file, and half the GTINs are missing, descriptions are inconsistent across feeds, and the data that does arrive through GS1 covers only a fraction of the catalog. A GS1 data pool solves the exchange-and-synchronization problem for the slice of the catalog that flows through it — but most real catalogs are a patchwork of pooled, spreadsheet, and marketplace data, each arriving with different quality and structure. Claro treats pool-sourced attributes as high-trust signals and extends the same identity resolution, enrichment, and write-back to every source your PIM or ERP touches.

Definition

When people ask what is a data pool in GS1 terms, the answer is a GS1-certified service that acts as the on-ramp and off-ramp for synchronized product data. Suppliers publish their item attributes — identifiers, descriptions, dimensions, packaging hierarchies, nutrition or hazard data — into a data pool. Retailers, distributors, and marketplaces subscribe through their own data pool, and the network keeps both sides in sync whenever the source record changes. Each item is keyed on its Global Trade Item Number (GTIN) and the supplier’s Global Location Number (GLN), so a record published once can be consumed by many recipients without re-keying.

There is no single global data pool. The GDSN is a federation of dozens of certified pools that all speak the same data model and message format, so a supplier connected to one pool can reach a retailer connected to another. The pool is the infrastructure; the synchronisation network is the routing layer that connects pools to each other. This distinction matters during onboarding, because choosing a pool is a one-time integration decision, while the data flowing through it is continuous.

What a data pool does — and does not do

A data pool guarantees a consistent, machine-readable version of an item across every partner who subscribes to it. When a CPG supplier corrects a case-pack quantity or an allergen flag, every subscribed retailer receives the same change automatically. That eliminates the manual re-keying and version-mismatch problems that plague spreadsheet-based onboarding.

But a data pool only standardizes the records that flow through it. Most real catalogs are a mix: GDSN-synced grocery and consumer items alongside MRO parts, furniture SKUs, and industrial components that were never published to any pool. A home-improvement retailer might receive paint and fasteners through GDSN while loading patio furniture and power tools from supplier spreadsheets. The pool gives you clean delivery for the synced slice; everything else still needs identity resolution, attribute enrichment, and classification before it can be matched against your master catalog or surfaced in AI-driven product search.

Incoming channel Typical categories Data quality on arrival
GDSN data pool Grocery, CPG, health and beauty Standardized, GTIN-keyed, auto-synced
Supplier spreadsheets MRO, furniture, industrial Inconsistent field names, free-text descriptions
Marketplace feeds Long-tail third-party items Variable completeness, often missing GTINs
Direct EDI or API Electronics, apparel Structured but schema varies by partner

Before and after: pooled data in a trusted catalog

Even records that arrive from a certified data pool can carry empty attributes, outdated specs, or taxonomy codes that do not match your internal classification. The table below shows the practical difference between raw pooled data landing in a PIM and the same records after a canonical product-data layer validates and enriches them.

Before — raw pool record After — Claro-resolved record
GTIN present, brand name in all-caps GTIN retained; brand normalized to title case
Case quantity populated, inner-pack empty Inner-pack filled from secondary source, confidence flagged
Product category is supplier taxonomy code Mapped to your internal taxonomy with audit trail
Allergen flags present, ingredient list missing Ingredient list sourced from published spec sheet
Record last updated 18 months ago Staleness alert surfaced; re-sync triggered automatically

Claro ingests both pool-sourced and non-pooled records, resolves them to a single canonical product record, validates against your attribute schema, and writes clean, complete data back into your existing PIM or ERP — so your full catalog is consistent and AI-citable, not just the synchronized portion.

How pool data fits into a broader data pipeline

A data pool handles publication and delivery. The steps before and after it determine whether that data actually improves your catalog:

  1. Supplier publishes to their certified pool

    The supplier loads item attributes into their pool. The GDSN routes the record to every subscribing retailer’s pool automatically.

  2. Retailer's pool delivers the record

    Your pool receives the publication and surfaces it as a candidate item for your catalog. At this point the record meets GDSN structural standards but may still be incomplete or misclassified against your internal model.

  3. Identity resolution and deduplication

    The incoming GTIN is checked against your existing canonical product records. Deterministic matching on GTIN handles clean cases; probabilistic matching on name, brand, and dimensions handles the rest. Duplicate candidates are flagged before they reach the PIM.

  4. Attribute validation and enrichment

    Required fields missing from the pool record are flagged. Where secondary sources exist — spec sheets, open datasets, other supplier feeds — Claro fills the gap with a source citation so you know exactly where each attribute came from.

  5. Write-back to PIM or ERP

    The validated, enriched record is written back to your system of record with full provenance. Future pool updates trigger the same pipeline automatically, so no manual re-keying is needed when a supplier changes a pack size or an allergen flag.

FAQ

What is a data pool in GS1 and GDSN?

In GS1 terms, a data pool is a certified electronic repository where suppliers publish standardized product data and recipients subscribe to receive it. The GDSN connects these certified pools into one network, so a record published to any pool can reach subscribers connected to any other pool. Each item is identified by its GTIN and the supplier’s GLN.

Is there only one global data pool?

No. The GDSN is a federation of dozens of GS1-certified data pools. They all use the same data model and message standards, so partners do not need to use the same pool to exchange data. You integrate with one pool and can reach trading partners connected to any other certified pool in the network.

What is the difference between a data pool and a PIM?

A PIM (Product Information Management system) is where you author, govern, and store your product content internally. A data pool is the external exchange layer that publishes or receives that content over GDSN. Many companies connect their PIM to a data pool so governed records flow out to trading partners automatically. The two are complementary, not interchangeable.

Do non-grocery products go through data pools?

Some do, but adoption is heaviest in grocery and CPG. Many MRO, furniture, and industrial catalogs never publish to a pool and arrive as spreadsheets or marketplace feeds instead. Most retailers therefore need a layer that can match, resolve, and enrich both pooled and non-pooled data so the full catalog is consistent.

How do I choose a data pool?

Choosing a pool is a one-time onboarding decision driven by where your trading partners already are, the categories and regions you support, and how the pool integrates with your PIM or ERP. Because all certified pools interoperate over GDSN, the choice affects integration effort more than reach.

What happens to data quality after records leave a data pool?

A data pool standardizes structure and delivery, but it does not validate completeness, catch taxonomy drift, or enrich missing attributes. Records that arrive clean at the pool level can still contain empty fields, misclassified categories, or outdated specs. A canonical product-data layer validates incoming pool records against your own master, flags attribute gaps, and writes corrected data back to your PIM or ERP so downstream systems stay accurate.

Claro

See how Claro handles this in production

This concept is one piece of keeping a catalog trusted. See how Claro resolves identity, enriches missing attributes, and validates every update before it reaches your PIM or ERP.

Learn more