Cheating at Search with LLMs
Nick K

Nick K @skeptrune

About: Co-Founder and CEO at Trieve. I like reading progression fantasy, running, and riding motorcycles.

Location:
San Francisco, CA
Joined:
Nov 6, 2023

Cheating at Search with LLMs

Publish Date: May 21
42 6

We've been doing this thing for a while at Trieve that we've been calling "cheating at Search with LLMs" and I thought it'd be cool to talk about it.

The Problem: Smart Product Comparisons

For our Gen AI sales associate Shopify app, we wanted to make it possible to do cool things like generate a comparison table for any two products. Take this example from the brand LifeWater, which sells filterable straws. If a customer asks to "compare the Sip against the Life Straw" (two different products in their portfolio), we need to quickly look inside their catalog to determine which two products to fetch.

The challenge? No traditional keyword, semantic, or hybrid search would ever be intelligent enough without an LLM to understand the exact two products being discussed.

Our Solution: Let the LLM Do the Hard Work

So we cheat. Here's how it works:

  1. First, we do a standard search with the user's query and get the top 20 results. Each group represents a product, and each chunk within that group is a variant of that product (like different colors or pack sizes).

  2. Then we use a tool called "determine relevance" that asks the LLM to rank each product as high, medium, or low relevance to the query. We pass each product's JSON, HTML, description text, and title to the LLM.

  3. The LLM examines each product and makes the call. For example, it might mark the Life Straw Sip Cotton Candy variant as "high" relevance, and the regular Life Straw as "high" relevance, while everything else gets "medium" or "low."

  4. We then use these relevance rankings to display only the most relevant products to the user.

Making It Fast

Despite making 20+ LLM calls in the background, the experience feels instantaneous to the user thanks to semantic caching on all the tool calls. If I run the same comparison again, it's blazing fast.

Going Even Further

We extend this approach to other aspects of search:

  • Price Filters: We have a tool call that extracts min and max price parameters
  • Category Determination: For stores with predefined categories, we use LLMs to determine the right category
  • Format Selection: We use tool calls to decide whether to generate text or images
  • Context Retention: If a user follows up with "tell me more about the Life Straw's filtration," we don't need to search again - we just use the same products from before

Why This Matters

It literally feels like cheating, which is incredible. In the early days, we spent a ton of time building super high-relevance search pipelines. But with modern LLMs, that's unnecessary. You can just fetch 20 things, give the LLM the query and each fetched item, and ask it which ones are relevant.

Absolute madness. Intelligence as a commodity.

Comments 6 total

  • K Om Senapati
    K Om Senapati May 21, 2025

    Wow this is a really good usecase

  • 𝚂𝚊𝚞𝚛𝚊𝚋𝚑 𝚁𝚊𝚒
    𝚂𝚊𝚞𝚛𝚊𝚋𝚑 𝚁𝚊𝚒May 21, 2025

    Let the LLM Do the Hard Work

    The solution was right there, sitting, staring at me. 🤯

  • Gabriel Peixoto
    Gabriel PeixotoMay 21, 2025

    Very interesting, let the LLM do things for me is the way..

  • Dotallio
    DotallioMay 21, 2025

    This is such a clever move - outsourcing the hardest part of search ranking to LLMs is honestly a game changer. Have you seen any weird edge cases where the LLM misranks something totally irrelevant?

    • Nick K
      Nick KMay 21, 2025

      Definitely. Good example are multi-packs. Like if you search for "blue shirts" then it's liable to rank a "t-shirt | multipack of 3" product a relevance rating of "high" when it really isn't. Have had to continuously tweak the prompt for edge cases like this.

  • Bap
    BapMay 22, 2025

    Interesting read

Add comment