AI Site Search for Shopify: Setup, Merchandising Rules, and KPIs
How to evaluate AI search, set merchandising rules, and measure improvements in conversion.
Why this matters
Site search is your highest-intent traffic channel. When shoppers use search, they’re telling you exactly what they want—so search quality directly changes conversion rate, average order value, and support load.
- Search conversion rate: compare sessions with search vs without search; target a clear lift over baseline.
- Zero-results rate: keep it low by fixing synonyms, catalog coverage, and spelling variants.
- Refinement rate: measure how often shoppers need filters/sorting after searching (high can signal relevance issues).
- Search exit rate: reduce “search → leave” by improving ranking, filters, and result messaging.
Tip: Track these weekly. Search changes tend to show impact faster than homepage redesigns.
The fastest wins come from catalog readiness + merchandising rules. “AI search” helps, but only after your data and rules stop the model from surfacing out-of-stock, incompatible, or margin-killing results.
Framework / workflow
This workflow keeps search improvements measurable, safe, and reversible. Treat search like a product: ship small changes, watch KPIs, and keep a rollback path.
Step 0 — Baseline-first (Shopify-native defaults)
- Confirm you can measure: search conversion, zero-results rate, top queries, and no-result queries.
- Fix obvious catalog issues first: missing product types, inconsistent tags, unclear variants, and out-of-stock handling.
Step 1 — Query audit (what shoppers actually ask)
- Export your top 50–200 queries and your top 20 no-result queries.
- Label each query as: Exact SKU, Category, Use-case, Attribute (size/material/compatibility), or Problem.
- Pick 10 “money queries” to improve first (highest volume × highest intent × biggest revenue relevance).
Step 2 — Build merchandising rules (before “AI”)
- Hard rules (never break): hide unavailable items, enforce compatibility, suppress restricted products, protect brand safety.
- Soft rules (optimize): boost high-margin products, boost bestsellers, pin seasonal collections, de-boost high-return SKUs.
- Result messaging: when results are weak, show helpful pivots (synonyms, collections, “popular in…”).
Step 3 — Controlled relevance tuning
- Start with synonyms (spelling, pluralization, brand nicknames).
- Then tune fields: title vs product type vs tags vs vendor.
- Only then add “AI features” (semantic matching, vector search, reranking) if you can explain wins with KPI changes.
Step 4 — Human QA loop (HITL)
- Review the top 20 results for your money queries weekly.
- Check: out-of-stock, wrong category, incompatible variants, policy conflicts, misleading results.
- Every rule change must have: owner, hypothesis, metric, and rollback note.
Step 5 — Measurement and iteration
- Run changes in small batches (e.g., 5–10 queries/rules at a time).
- Evaluate in a fixed window (e.g., 14–28 days) and compare to baseline.
- Keep a changelog. Search quality compounds when you keep history.
Templates / prompts
Use these templates to generate bounded outputs (no hallucinated inventory, no invented policy terms). Always feed the model your actual catalog fields and policy text.
Role: Ecommerce search analyst.
Goal: Label shopper search queries into intent categories for Shopify merchandising.
Inputs:
- Top queries: {{paste list}}
- Catalog constraints: {{collections, product_types, key attributes}}
Rules:
- Output must be deterministic and short.
- If unclear, label as "needs review" and propose 1 clarification question.
Output:
A table with columns: query | intent_type (sku/category/use-case/attribute/problem) | suggested landing (collection/product) | notes
Role: Shopify search merchandiser.
Goal: Create a synonym map to reduce no-result queries without breaking relevance.
Inputs:
- No-result queries: {{paste list}}
- Approved terms (brand + product naming): {{paste}}
- Forbidden mappings: {{paste}} (e.g., do not map "case" to "phone")
Constraints:
- Propose only synonyms that are safe.
- Flag risky mappings for human review.
Output:
JSON list: [{"from":"", "to":["",""], "risk":"low|med|high", "reason":""}]
Role: Ecommerce merch strategist.
Goal: Propose search rules for the "money queries" while respecting inventory + policy.
Inputs:
- Money queries: {{paste list}}
- Catalog fields: title, product_type, tags, vendor, availability, margin_bucket, return_risk
Hard rules:
- Never show unavailable items first.
- Never surface incompatible variants (compatibility_tags required).
- No policy conflicts (restricted tags excluded).
Output:
For each query: hard_rules (must) + soft_rules (boost/deboost) + pinned_items (optional) + rollback note
Role: QA reviewer for Shopify search results.
Goal: Score the top 10 results for each query and flag issues.
Inputs:
- Query: {{query}}
- Top 10 results: {{title, handle, availability, price, key attributes}}
Scoring:
- Relevance (0-3), Availability (0-2), Compatibility (0-2), Margin/strategy (0-2), Policy-safe (0-1)
Output:
Score table + top 3 fixes to implement next
Role: Ecommerce experiment designer.
Goal: Create a test plan for search relevance changes.
Inputs:
- Change description: {{synonyms/rules/field weights}}
- Primary KPI: {{search conversion rate}}
- Guardrail KPIs: {{AOV, return rate proxy, search exit rate}}
Constraints:
- Define sample window and stop conditions.
Output:
Hypothesis | metrics | duration | segmentation | stop-loss | how to interpret results
Execution layer: search tuning cadence
Internal search is a merchandising system. Review query logs weekly until the top zero-result, low-click, and high-exit searches are resolved, then move to a monthly cadence.
- Map synonyms to shopper language, not internal naming.
- Create redirects or promoted results for high-intent queries like “gift”, “replacement”, “bundle”, and size/material terms.
- Do not let AI recommend unavailable, low-margin, or policy-sensitive products without explicit rules.
Checklist
- Measurement ready: you can report search conversion, zero-results rate, and top/no-result queries.
- Catalog hygiene: product types, tags, and key attributes are consistent; variants are properly structured.
- Availability rules: out-of-stock items are suppressed or clearly labeled (not ranked #1 by accident).
- Synonyms shipped: top no-result queries have safe synonym coverage (with risky mappings flagged).
- Merch rules documented: hard rules (never break) vs soft rules (optimize) are written down with an owner.
- QA rubric used: money queries reviewed weekly; top issues are logged and fixed.
- Experiment discipline: each change has hypothesis, KPI, evaluation window, and rollback note.
- Internal links present: link to Shopify AI, Getting Started, and one of Tools or Use Cases.
- Money queries have no policy conflicts and no incompatible results in the top 10.
- Zero-results rate is trending down week-over-week for addressed queries.
- Search conversion is stable or improving (no hidden regressions on AOV/returns proxy).
- Changelog updated and rollback steps documented.
FAQ
Do I need “AI search” to improve Shopify search?
Not at first. Most stores get bigger wins from catalog hygiene, synonym coverage, and clear hard rules (availability, compatibility, policy). Add AI features only after you can measure and explain improvements.
What are “money queries” and how many should I start with?
Money queries are high-volume, high-intent searches strongly tied to revenue (category + use-case terms). Start with 10, fix them end-to-end, then expand in batches.
How do I reduce zero-results without breaking relevance?
Use a synonym map with a risk label. Safe mappings cover spelling/plurals/nicknames. Risky mappings (broad meaning changes) must be reviewed and tested on a small subset of queries.
What should be hard rules vs soft rules?
Hard: never show incompatible or restricted items; avoid out-of-stock at the top. Soft: boosts for margin, bestsellers, seasonal collections, or inventory goals—tuned gradually with KPIs.
How often should I review search performance?
Weekly for the top queries and no-result queries; monthly for deeper relevance checks and rule cleanup. Search quality compounds when you maintain a changelog and a small, steady cadence.
What KPIs matter most after launch?
Start with search conversion rate and zero-results rate. Add guardrails: search exit rate, AOV, and a returns proxy (e.g., high-return SKUs surfacing too often on generic queries).
How do I prevent “AI” from recommending the wrong items?
Constrain it with structured fields (compatibility, availability, restricted tags), enforce hard rules, and run a HITL QA loop on money queries. If the rules can’t explain the result, don’t ship it.
Start Shopify first, then add AI workflows where they’re measurable and safe.