Dark search: 7,000 data points on ai sources
Sortlist Insights

Inside dark search: 7,000 sources reveal how AI recommends your product

2 votes, average: 5.00 out of 52 votes, average: 5.00 out of 52 votes, average: 5.00 out of 52 votes, average: 5.00 out of 52 votes, average: 5.00 out of 55.00/5(2)

By 2026, the search funnel will have changed beyond all recognition. AI will control the information online, and the influence on most buying decisions.

Google still dominates – but influence has moved behind AI-generated answers. And dark search is what happens when your customers get answers from AI (Google’s AI Mode or AI Overviews, ChatGPT or Perplexity) – without ever clicking through to your website.

No clicks, no analytics. The same influence, just harder to track.

It sounds extreme. But we analysed 7,000 AI-chosen sources – and the numbers say the same:

  • AI now influences decisions upstream: 90% of informational queries are answered by AI, not clicked through.
  • Google ≠ the full picture: There’s just a 13–15% overlap between Google’s top results and ChatGPT
  • RAG is your entry point: Real-time search (Retrieval-Augmented Generation) pulls fresh, structured, and authoritative content from the web.
  • Two formats dominate: “Best of” lists and “X vs Y” comparison pages are the most cited AI sources.
  • Blogs and directories rule: 70% of AI-cited content comes from blogs (51%) and directories (19%): not homepages or product pages.

Welcome to the era of “dark search”.

This research unpacks the AI black box – with data on how it chooses what to recommend, exactly where it finds its answers, and how to target its sources so your brand appears first. 

ChatGPT, Google, and the invisible funnel

If you thought ChatGPT’s launch (100 million monthly users in two months) was impressive – look at its growth in 2024.

ChatGPT growth
Source: Similarweb

Yet Google is still the dominant force in search, with 16.4 billion daily searches.

But the real story is how it has changed search habits.

Google quickly reacted, releasing AI Overviews in May 2024: ChatGPT-style answers above traditional search results. Almost 20% of searches get an AI Overview.

how chatgpt gets information: AI overview growth
Source: SE Ranking

Today, 90% of AI-generated searches are informational, although they haven’t drastically changed the volume of informational queries (users actually ask more, because AI delivers instant, better answers). 

how chatgpt gets information: AI overviews by search intent
Source: Semrush

But these queries no longer drive traffic to your site. AI Overview (AIO) queries send 8 million clicks per day through to the sources it uses. But pre-AIO, this figure would have been around 100 million.

The gap: 92 million lost clicks per day, and widening.

how chatgpt gets information: AI impact on informational traffic
Source: Google/Semrush

And in May 2025 Google flipped the switch on AI Mode, a full chat box inside the search results. No blue links, just a cited, ChatGPT-style answer that users rarely click past.

AI mode


AI already controls information online

So while Google is still the gatekeeper of clicks, it’s no longer the gatekeeper of influence.

AI (in all its forms) is now what influences decisions pre-purchase.

ChatGPT, Perplexity, AI Mode and Google’s AI Overviews surface information, make recommendations, and even name products. But most of the time, the user doesn’t click (and you don’t see their traffic in analytics). 

Users get their information (via AI), make their decision, and they purchase direct. 

That’s the search funnel in the era of dark search

  • The 53% of Informational questions are fulfilled by AI, and drive little to no traffic.
  • The other 47% of traffic remains – concentrated on navigational, commercial and transactional queries – demand that has been influenced by AI answers.
how chatgpt gets information: Impact of dark search on google traffic
Source: Sparktoro

There’s no Google top 10 to crack, no keyword volume to chase. That’s why we call it dark search: influence without visibility, recommendations without rankings.

But to get AI recommending your product, your all-important transactional traffic will remain.

Can you influence AI’s recommendations?

Yes. And we’ve done the research to prove exactly how.


Dark search: how AI chooses winners

70% of searches on ChatGPT can’t be classified in the “traditional” search buckets: informational, navigational, commercial, and transactional.

search intent on chatgpt
Source: Semrush

And that’s because people don’t search on ChatGPT like they do on Google. When talking with AI, users ask full questions. They mix commands with comparisons and include context.


👉 “Which are the best CRMs for pre-seed startups in 2025”
👉 “Compare Notion and Trello for content planning”
👉 “Show me alternatives to Mailchimp with better support. I don’t need fancy designs”

These queries are:

  • Conversational
  • Mixed-intent
  • Often specific to use case or user role

To tackle this complex kind of query AI normally pulls answers from two places – static retrieval or web search. One hard to influence, the other much easier.

Static retrieval: AI’s trained memory

This is everything ChatGPT read before its last update. Books, websites, articles, academic journals… It’s used for questions where the answer doesn’t change over time.

Think:

👉 “When did the Roman Empire fall?”
👉 “How does influencer marketing differ for industrial brands?”

You can’t easily influence these answers. They’re baked into the model’s memory. Unless you control a massive portion of the public internet, you’re unlikely to change its view.

RAG: Real-Time Search

RAG stands for Retrieval-Augmented Generation. This is when ChatGPT actively looks online: hitting APIs, scraping structured content, checking documentation.

Think:

👉 “What’s the cheapest CRM for startups in 2025?”
👉 “Compare Monday.com vs ClickUp”
👉 “Best noise-cancelling headphones under €200”

These are your real opportunities. If your content is recent, structured, and authoritative, AI might pull directly from you – and being present in them is the easiest way to get recommended by AI.

The search funnel in 2026

This new user behaviour requires a new concept of the search funnel. It needs to:  

  • Take into account the 70% of queries that don’t fit into the traditional search buckets.
  • Understand the extent of real-time (RAG) information retrieval: the simplest way to influence results. 

So we analysed hundreds of AI queries and grouped them by intent, stage, and how likely they are to trigger a real-time (RAG) response. The pattern is clear:

BucketPrompt ExampleFunnel StageRAG?Notes for Product Teams
Discover & Educate

“What does Shopify cost vs WooCommerce?”

“Show me the latest specs of the Sony A7 IV.”

Awareness / Early ConsiderationStrong RAGSurface up-to-date specs, prices, comparisons. Factual accuracy matters.
Shortlist & Decide

“Is the M3 MacBook worth extra €300 over M2?”

“Rank budget hotels near Shibuya with free breakfast.”

Late Consideration / PurchasePartial RAGCombine static context with fresh info. Highlight pros/cons, pricing, features.
Transact & Support

“Add two boxes of Nespresso Original to my cart.”

“Why is my Roomba showing error 26?”

Conversion & Post-purchaseWeak RAGRequires API calls and retrieval from support docs. Speed is key.
Create & Promote

“Write a product-launch tweet thread.”

“Generate five upsell email subject lines.”

Marketing / RetentionNo RAGMostly generative in nature.

The best opportunities to hack AI search lie in two stages:

🧠 Discover & Educate: where users explore categories
⚖️ Shortlist & Decide: where they choose between options

These are most likely to trigger a RAG response, and pull from the online sources where your content needs to show up.

Let’s break down exactly how these queries work, and what they tell AI to look for.

The real data on dark search

We partnered with Omnia, a new generation AI Search analytics tool, to analyse 1,000 searches on ChatGPT, Perplexity, and from Google’s AI Overviews – and we analysed the 7,300 sources they pulled from to know exactly what kind of sources it uses.

The findings are stark:

👉 Blogs (51%) and directories (19%) overwhelmingly dominate sources used by AI.
👉 Mid-funnel (“Discover & Educate”) queries heavily favour blogs (75% of sources).
👉 Late-funnel (“Shortlist & Decide”) queries are five times more likely to pull from directories.

Crucially, AI recommendations differ sharply from traditional Google rankings:

👉 There is only a 13% overlap between Google’s top 10 results and ChatGPT sources.
👉 AI-chosen sources have far lower keyword overlap (40%) compared to Google’s top-ranking pages (65–85%).

Classifying the sources AI uses

how chatgpt gets information: breakdown by page type
  • 51% of sources are blog content
  • 19% of sources are directories or “top” lists
  • 11% of sources are home pages of relevant companies
  • 7% are news websites like The Guardian
  • 3.5% of sources come from Youtube
  • And 2.3% come from Wikipedia

Translation: If your brand isn’t prominently featured on the right blogs or directories, AI won’t recommend you.

But that’s not the whole story. When we look at the breakdown of sources by query type, we see a difference between “Discover and Educate” queries (mid funnel, category education) and “Shortlist and Decide” (lower funnel, decision focussed).

For Discover and Educate queries, blog posts and news websites make up almost 75% of sources.

But when the user is looking to make a decision – closer to the point of purchase – directories are five times more prevalent.

how ai chooses sources: educate vs decide source types

So: the strategy should be to appear in the right blogs to educate on your category and include your solution as one of the leading examples. 

Then be on the right directories and “top X” lists for when your potential customer is deciding on a solution to solve their problem. 

So which are the “right” blogs and directories?

Good news – AI models tend to like the same kind of sources as Google or Bing. They surface pages with:

👉 Strong Domain Authority (DA).
👉 Robust backlink profiles.
👉 Clearly structured, authoritative, factual content.

You can see this in the data: Over 60% of the sources chosen by AI have a domain rating of over 70. And 70 DR really seems to be the threshold for a source to be chosen by AI. Below 70 DR, we didn’t see huge variations in the probability a site will be chosen as an AI source.

how chatgpt gets information: Are high DR domains more likely to be cited?
Source: Omnia/Ahrefs

AI ≠ Google

OK, so AI models just show the same results as Google?

It seems that isn’t exactly the case.

The fact that queries on AI models tend to be more conversational, and less keyword driven, affects the correlation between the query and the results.

👉 Pages in Google’s top 10 contain 65% – 85% of the keywords they rank for in their title tag
👉 AI sources show a 40% keyword overlap with the same top 10

And when we look at the overlap between the AI sources and Google’s top 10 results for the same query, we see some important variations.

  • AI Overviews: 15%
  • Perplexity: 75%
  • ChatGPT: 13%
how chatgpt gets information: overlap with top 10
Source: Omnia



So while SEO fundamentals remain important – it’s not a given that by appearing in P1 of Google you’ll be used as a source for an AI answer. Indeed it’s more likely that you’re not chosen.

The way that AI models choose their sources is more complex, and requires a dedicated strategy.

Dedicated analysis and monitoring of the real sources would be needed to optimise for their answers – and it’s an extra layer you should add on top of your existing SEO strategy.

The two most valuable AI source types

There are two clear kinds of content that get cited over and over again – for some of the most interesting bottom funnel queries. We can break our classification of “Shortlist and Compare” down into its parts.

Shortlist” queries are when a user is looking to develop a list of the best options. You want your product in the conversation.

Compare” queries are when the user has narrowed down the options to just a few. Here you want to show off against the competition.

By analysing these kind of queries separately, we see some interesting trends in the sources chosen: 

👉 For Shortlist: (“Which X are the best”) – directories and top X lists dominate as sources.
👉 For Compare: (Compare X to Y) – head-to-head or “VS” content dominates.

how chatgpt gets information: Comparison by source type

Making the user’s shortlist: Lists and directories

This kind of “Decide” query asks things like: “What’s the best running shoe for tight calves?”

It’s a question a user asks AI when they recognise a problem and have decided a solution is needed, but are still open to the exact provider, product or tool they will purchase.

Directories and lists dominate sources for queries looking for the “best”:

AI source types for shortlist queries

Why? Let’s take one query example: “Best Online Advertising Agencies in Paris”. One of the most cited sources is Sortlist – but why?

Chosen URL: https://www.sortlist.com/i/advertising/paris-fr

AI-Friendly SignalHow the Sortlist Page Delivers
Query-matching title“The 100 Best Online Advertising Agencies – 2025 Reviews” maps almost word-for-word to common prompts like “best online ad agencies 2025.”
Explicit scope & recencyYear-stamp (“2025”) + live agency count (14 k+) tell AI the list is current.
Structured layoutH2 sections for Top Featured Agencies and All Companies; each card follows the same field order: rating, reviews, services, location, budget.
Comparison-ready tablesBuilt-in filters (budget, ratings, location) act as dynamic comparison columns AI can parse.
Authoritative signalsSortlist’s domain specialises in agency rankings → strong topical authority + high DR.
Internal links to deeper contentEach card links to a detailed agency profile (more structured data, reviews, case studies).
Regular updatesThe agency count and review numbers update continuously – that’s critical for RAG freshness.

Influencing the final decision: VS pages

This is the next step along the funnel. Once the user has been educated on the best options for their problem and they’ve narrowed it down to just a few, they will turn to AI to compare pros and cons. 

Here we see a clear difference – preference for blog style content.

AI source types for compare queries

And there’s one very clear kind of blog content that dominates. 

“VS” content. Written content with the title X vs. Y – comparing the two options.

In fact, 90% of those sources literally contain the word “VS” – an explicit, head to head comparison of one product with another.

compare sources containing "VS"

TechRadar is a top source for the query “Compare Ring vs Nest for home security”. And its perspective on the pros and cons of of each is being used to build ChatGPT’s recommendation. Why?

Chosen URL: https://www.techradar.com/news/ring-vs-nest-its-the-clash-of-the-cameras-in-a-doorstep-duel

AI-Friendly SignalHow the Tech Radar Page Delivers
Exact-match title & URL“Ring-vs-Nest” appears in the slug, H1, and metadata – perfect for prompts like “Ring vs Nest doorbell.”
Structured comparison tableA feature-by-feature grid (price, resolution, field of view, subscription cost) gives the model machine-readable data.
Clear verdict sectionsBold sub-headings: “Ring vs Nest: Key Similarities”, “Ring vs Nest: Key Differences” help AI make a granular decision based on more criteria.
Pros & cons bulletsUniform bullets under each device make sentiment extraction easy.
Freshness stamp‘Updated March 2025’ – note on the latest firmware keeps RAG queries happy.
Domain authority & topical depthTech Radar ranks, reviews, and benchmarks consumer tech daily: high DR, dense internal links.
Rich media with alt-textSide-by-side product shots, each with descriptive alt-tags (resolution, model), add extra semantic cues.

How to reverse engineer dark search for your business

By 2026, AI will answer most of your buyers’ questions before they ever hit your site. That means influence moves upstream. Into the models. Into the content they reference. Into the directories, blogs, and “vs” pages they trust.

Start by searching ChatGPT like your customers would. Ask for the best tools, agencies, or alternatives in your space. Compare yourself to competitors. Then study the sources it cites.

Now ask:
❓ Can I get listed there?
❓ Can I create something better?

Many brands already are. PR agencies pitch their clients to appear in listicles and SEO providers are already building the content AIs love.

Tools like Omnia analyse thousands of AI answers from ChatGPT and Perplexity. They show how your brand appears, how influence shifts, and what content moves the needle.

If you’re serious about this, these tools save you hours of manual effort.

So what’s the play?

1. Be where AI looks.
Get featured in high-authority blogs, directories, and top-10 lists in your category. These are the sources AI trusts most, especially for education and comparison queries.

2. Create content AI can parse.
Focus on structured, factual, clearly labelled content. Think “X vs Y” pages. Feature tables. Pricing breakdowns. Pros and cons. Side-by-sides. This is the content AI loves and uses.

3. Monitor your visibility in the answers.
Reverse engineer AI tools to track which sources AI is citing in your space. Watch your share of voice move over time. See what’s working and double down.

4. Feed clean product data. ChatGPT pulls from schema-rich feeds (Product, Offer, Review). Expose yours via Merchant Centre or a public XML/JSON feed: no extra content writing required.

To win at search in 2026, you won’t chase clicks. You’ll chase influence. And you’ll win by being in the AI source.

close

Access our exclusive content!

email