Hierarchical Retrieval for Variant-Heavy B2B Catalogs: How to Stop Mixing Up Product Families and SKUs

Many B2B product catalogs contain families, variants, accessories, and region-specific part numbers that look semantically similar but behave very differently. Hierarchical retrieval helps product AI answer at the right level, from category to family to exact SKU.

Axoverna Team
11 min read

One of the fastest ways to lose trust in a product AI is to answer confidently about the wrong variant.

A buyer asks about a 230V single-phase model, and the AI replies with specs from the 400V three-phase version. Or it explains the capabilities of a product family page when the question was clearly about a specific SKU with a different seal material, pressure rating, or certification. The wording sounds plausible. The answer is still wrong.

This problem shows up constantly in B2B catalogs because the underlying data is hierarchical by nature. You rarely have a flat list of unrelated products. You have product categories, families, series, variants, accessories, regional versions, replacement parts, and bundle configurations. Traditional RAG pipelines often flatten all of that into chunks and hope embeddings will sort it out.

They usually do not.

If your catalog is variant-heavy, you need a retrieval architecture that understands levels of abstraction. In practice, that means hierarchical retrieval: a system that can retrieve broad context when the query is exploratory, narrow context when the query is specific, and move between those levels deliberately instead of blending them together.


Why Flat Retrieval Fails in Variant-Heavy Catalogs

The default RAG implementation is simple: chunk documents, embed them, retrieve the nearest chunks, generate an answer. That works reasonably well for FAQ content and broad documentation. It gets fragile when many products share overlapping language.

Consider a distributor catalog for industrial drives:

  • DrivePro X100, 230V single-phase, 0.75kW
  • DrivePro X100, 400V three-phase, 0.75kW
  • DrivePro X100, 400V three-phase, 1.5kW
  • DrivePro X100 Washdown variant, IP69K
  • DrivePro X100 ATEX variant for hazardous environments
  • DrivePro X100 braking resistor kit
  • DrivePro X100 fieldbus module

Semantically, these are close neighbors. If a user asks, "Does the X100 support Modbus RTU and can I run it in a washdown area?" a naive retrieval system may return a mix of:

  • the generic series overview
  • the fieldbus accessory page
  • the washdown variant datasheet
  • a standard indoor variant manual

Now the model has to infer whether Modbus is native or optional, whether washdown applies to the entire family or only a specific enclosure, and whether the answer concerns a variant or an add-on module. That is a recipe for subtle hallucinations.

This is the same pattern behind many issues that show up in structured data retrieval for specs and tables, metadata filtering in product catalogs, and entity resolution across messy B2B catalogs. The catalog structure exists, but the retrieval layer is not respecting it.


What Hierarchical Retrieval Means

Hierarchical retrieval is not one single algorithm. It is an architectural principle:

  1. Represent the catalog at multiple levels
  2. Detect which level the user is asking about
  3. Retrieve from the appropriate level first
  4. Drill down or roll up when the answer requires it

A practical hierarchy for B2B product knowledge often looks like this:

LevelExampleTypical user intent
CategoryVariable frequency drives"What kind of drive do I need?"
Family/SeriesDrivePro X100 series"What does the X100 line support?"
Variant/SKUX100-075-230-STD"What is the input voltage and enclosure rating?"
Component/Attributeterminal layout, pressure rating, certification"Does this exact model support Modbus RTU?"
Related objectaccessory, spare part, replacement, substitute"Which brake resistor fits this SKU?"

The key is that these are not interchangeable retrieval units. A family-level document is useful for orientation. It is dangerous as the sole evidence source for a question about a specific SKU.

Good systems treat family, variant, and attribute nodes differently instead of embedding them into one big semantic soup.


The Core Design Pattern: Retrieve Wide, Then Resolve Precisely

For most catalogs, the best approach is a two-stage or three-stage flow.

Stage 1: Intent and Entity Detection

Before retrieval, classify the query:

  • Is this exploratory or exact-match?
  • Is the user referring to a family, a variant, or an application need?
  • Are there explicit identifiers such as SKU, manufacturer part number, voltage, size, or material?
  • Is the user asking for comparison, compatibility, substitution, or availability?

This does not need a complex classifier. A lightweight query parser plus the kind of query intent classification many teams already use is often enough.

If the query contains a clean SKU or near-SKU token, variant-level retrieval should dominate. If it contains only a series name plus an application requirement, family-level retrieval may be the right starting point.

Stage 2: Hierarchical Candidate Retrieval

Retrieve candidates separately from different indexes or entity types:

  • Family index
  • Variant/SKU index
  • Accessory and spare parts index
  • Technical attribute index
  • Documentation index

This is usually better than keeping everything in one index with one ranking policy. You can still use a hybrid search approach, but score candidates within their level first.

For example:

  • BM25 + embeddings over exact SKU text for variants
  • Semantic retrieval over product descriptions for families
  • Structured lookup over normalized attributes for specs
  • Link-based expansion for accessories and compatible parts

Stage 3: Resolve the Answering Scope

After candidate retrieval, determine the scope the answer should use.

If the user asked, "What certifications does the X100 Washdown have?", and retrieval returns both the series page and the washdown datasheet, the system should mark the answer scope as variant-specific. Family content can provide context, but variant evidence gets priority.

If the user asked, "What is the difference between the X100 and X200 series?", the scope becomes family-level comparison. Exact SKUs are secondary unless the system asks a clarifying question.

This scoping step is where many product AI systems quietly fail. They retrieve enough information, but they never explicitly decide which level is authoritative.


How to Model the Catalog for Hierarchical Retrieval

The retrieval architecture only works if the catalog objects are modeled cleanly.

At minimum, each indexed object should carry:

  • entity_type (category, family, variant, accessory, doc_section, attribute)
  • family_id
  • variant_id or SKU
  • parent_id
  • region
  • lifecycle_status
  • normalized attributes (voltage, size, pressure, material, thread type, certification)
  • link edges such as compatible_with, accessory_for, replaces, part_of_family

This is an extension of the same discipline discussed in product data governance for AI readiness. If your product records do not consistently distinguish a family from a variant, retrieval quality will always be capped.

A useful mental model is to think of every answer as a join across three kinds of truth:

  • semantic truth: what similar text says
  • structured truth: what normalized attributes say
  • relational truth: how entities connect to each other

Flat RAG over chunks mainly captures semantic truth. Hierarchical retrieval combines all three.


A Simple Scoring Strategy That Works Surprisingly Well

You do not need a research-grade GraphRAG system to get value here. A practical production scorer can look like this:

score =
  0.35 * lexical_match +
  0.30 * semantic_similarity +
  0.20 * hierarchy_match +
  0.10 * attribute_match +
  0.05 * popularity_or_usage_prior

Where:

  • lexical_match rewards exact SKU or part-number overlap
  • semantic_similarity handles descriptive queries
  • hierarchy_match boosts the entity level that best fits the detected intent
  • attribute_match rewards matching voltage, material, dimensions, certification, and region
  • popularity_or_usage_prior can break ties using real-world selection patterns

The most important term here is often hierarchy_match. If the query is variant-specific, family-level hits should not outrank exact SKU records just because the family page has richer prose.

This principle also plays nicely with reranking in two-stage retrieval. First-stage retrieval casts a broad net, then a reranker with hierarchy-aware features decides what should actually feed the LLM.


When the Right Move Is to Ask a Clarifying Question

Hierarchical retrieval should reduce ambiguity, not pretend ambiguity does not exist.

Suppose the user asks:

"Can the X100 run in a food processing washdown environment?"

If the catalog has both standard and washdown variants, the safest system behavior is:

  1. Retrieve the family and relevant variants
  2. Notice that washdown support is variant-specific
  3. Ask: "Do you mean the standard X100 or the X100 Washdown variant? The enclosure rating differs."

That is much better than blending the variant data into a generic answer.

This connects directly to the value of clarifying questions in B2B product AI. In complex catalogs, the best answer is often a short, targeted question that moves the system from family scope to variant scope.


Common Failure Modes to Watch For

Even strong teams trip over the same issues.

1. Family Pages Dominating Everything

Family pages often contain the most text, the cleanest marketing copy, and the broadest term coverage. Embedding search loves them. Unless you counterbalance with hierarchy-aware scoring, they drown out exact SKUs.

2. Variant Attributes Buried in PDFs

If the only place a seal material or certification appears is a PDF table, your variant retrieval will stay weak. Extract and normalize those attributes into indexable fields. This is one of the biggest practical wins in product AI.

3. Accessories Treated as Peer Products

Accessories should usually enter retrieval through relationship edges, not as generic semantic neighbors. Otherwise you get answers where the brake resistor looks like an alternate drive, or the mounting kit looks like a standalone product.

4. Regional Part Numbers Not Linked

Many catalogs have US, EU, and legacy ERP identifiers for the same underlying item. If those aliases are not resolved into one entity graph, users experience false negatives, duplicate answers, or misleading substitutions.

5. Answer Generation Ignoring Evidence Priority

Even if retrieval gets the right records, the prompt may still treat all context as equal. Tell the model explicitly: variant-level evidence overrides family-level summaries when the two differ.


Metrics That Actually Reveal Progress

If you want to know whether hierarchical retrieval is working, generic chatbot thumbs-up metrics are not enough.

Track at least these:

  • Variant precision: when a query is variant-specific, how often is the top evidence the correct variant?
  • Family-vs-variant confusion rate: how often does the system answer at the wrong abstraction level?
  • Clarification rate on ambiguous family queries: are you asking when ambiguity is real?
  • Accessory linkage success: for "what fits this SKU?" queries, how often are the returned related items actually correct?
  • Attribute grounding accuracy: how often do voltage, material, dimension, and certification claims map to structured source fields?

These complement broader RAG evaluation and monitoring practices. The important part is to evaluate hierarchy-specific failures explicitly. Otherwise the system can look good in aggregate while still making dangerous SKU-level mistakes.


Where This Matters Most Commercially

Hierarchical retrieval is not just an engineering nicety. It matters most in the revenue-critical parts of B2B commerce:

  • product selection for complex technical catalogs
  • compatibility and spare-parts lookup
  • guided selling across configurable families
  • distributor support workflows with many near-identical SKUs
  • aftermarket and replacement scenarios where legacy part numbers matter

These are exactly the workflows where a wrong answer is expensive. A buyer does not care that your chatbot was semantically "close." They care whether the selected part fits, complies, and ships.

That is why the best product AI systems behave less like a generic chatbot and more like a disciplined technical rep. They know when to stay broad, when to zoom in, and when to stop and clarify.


The Bottom Line

If your catalog has product families, variants, accessories, and regional identifiers, flat retrieval will eventually confuse them. The fix is not just a better model. It is a better retrieval design.

Hierarchical retrieval gives your AI a way to answer at the right level of abstraction, promote exact SKU evidence when precision matters, and use family context without letting it overrule variant truth. In real B2B environments, that is the difference between a helpful sales assistant and an expensive source of plausible mistakes.

Teams that get this right usually do three things well:

  • model catalog entities cleanly
  • retrieve candidates across multiple levels intentionally
  • enforce answer scope before generation

Do that, and your product AI becomes much better at the questions that matter most.


Ready to make your catalog AI variant-aware?

Axoverna helps B2B teams turn complex product catalogs into product knowledge systems that understand families, variants, attributes, and compatibility, not just chunks of text. If you want more precise answers, fewer SKU mix-ups, and a better buying experience, book a demo and we’ll show you how it works.

Ready to get started?

Turn your product catalog into an AI knowledge base

Axoverna ingests your product data, builds a semantic search index, and gives you an embeddable chat widget — in minutes, not months.