Cheating at Search with LLMs

We've been doing this thing for a while at Trieve that we've been calling "cheating at Search with LLMs" and I thought it'd be cool to talk about it.

The Problem: Smart Product Comparisons

For our Gen AI sales associate Shopify app, we wanted to make it possible to do cool things like generate a comparison table for any two products. Take this example from the brand LifeWater, which sells filterable straws. If a customer asks to "compare the Sip against the Life Straw" (two different products in their portfolio), we need to quickly look inside their catalog to determine which two products to fetch.

The challenge? No traditional keyword, semantic, or hybrid search would ever be intelligent enough without an LLM to understand the exact two products being discussed.

Our Solution: Let the LLM Do the Hard Work

So we cheat. Here's how it works:

First, we do a standard search with the user's query and get the top 20 results. Each group represents a product, and each chunk within that group is a variant of that product (like different colors or pack sizes).
Then we use a tool called "determine relevance" that asks the LLM to rank each product as high, medium, or low relevance to the query. We pass each product's JSON, HTML, description text, and title to the LLM.
The LLM examines each product and makes the call. For example, it might mark the Life Straw Sip Cotton Candy variant as "high" relevance, and the regular Life Straw as "high" relevance, while everything else gets "medium" or "low."
We then use these relevance rankings to display only the most relevant products to the user.

Making It Fast

Despite making 20+ LLM calls in the background, the experience feels instantaneous to the user thanks to semantic caching on all the tool calls. If I run the same comparison again, it's blazing fast.

Going Even Further

We extend this approach to other aspects of search:

Price Filters: We have a tool call that extracts min and max price parameters
Category Determination: For stores with predefined categories, we use LLMs to determine the right category
Format Selection: We use tool calls to decide whether to generate text or images
Context Retention: If a user follows up with "tell me more about the Life Straw's filtration," we don't need to search again - we just use the same products from before

Why This Matters

It literally feels like cheating, which is incredible. In the early days, we spent a ton of time building super high-relevance search pipelines. But with modern LLMs, that's unnecessary. You can just fetch 20 things, give the LLM the query and each fetched item, and ask it which ones are relevant.

Absolute madness. Intelligence as a commodity.

Nick K @skeptrune