You put money into your website. You published content. You optimized for SEO. Then a prospect asked ChatGPT who leads your category and your company wasn't in the answer.

That is not a content quality problem. It is not an SEO problem. It is a structural problem — and the research now proves it.

AI search engines show a systematic, measurable bias toward third-party earned media over brand-owned content. This is not a hypothesis. Multiple independent studies have now confirmed it at scale: when AI engines like ChatGPT, Perplexity, Gemini, and Claude synthesize answers, they preferentially pull from independent publications and earned coverage, not from your website. Your investment in owned content is largely invisible to the systems now mediating how your buyers research and compare vendors.

This article explains why that bias exists, what the data says, and what the only proven correction actually is.

Key takeaways

What the research found: AI engines have a structural bias toward earned media

In September 2025, researchers from the University of Toronto published a comprehensive comparative analysis of AI search engines and traditional web search. They ran large-scale, controlled experiments across multiple verticals, languages, and query types to measure one thing directly: where do AI engines actually get their sources, and how does that differ from Google?

The findings were unambiguous. According to the study published at arXiv (Chen et al., September 2025), AI search engines "exhibit a systematic and overwhelming bias towards Earned media (third-party, authoritative sources) over Brand-owned and Social content, a stark contrast to Google's more balanced mix."

In specific verticals, the gap was stark:

The spread narrows for transactional queries where brands naturally dominate, but for the informational and consideration queries that drive B2B research and vendor selection, earned media is what AI engines reach for first.

A second study published in January 2026 at arXiv (2601.16858) conducted a large-scale empirical analysis measuring source typology across AI engines. Claude concentrated most heavily on earned sources at 65%, followed by GPT-4o at 57% earned. Perplexity and Gemini were more balanced, but both showed earned media dominance on consideration queries where purchase decisions form. Social content was nearly absent from AI search results entirely.

The pattern is consistent and directional: AI engines were trained on the open web. The open web's most credible, high-authority content is in editorial publications, not brand sites. That training bias is now baked into citation behavior, and no amount of on-page optimization changes it.

What coverage breadth data proves: distribution multiplies citations

The bias toward earned media raises a follow-on question: is all earned coverage equal, or does distribution across multiple publications compound the signal?

Stacker, the earned media distribution platform, released its largest GEO study to date in March 2026. The study, conducted in partnership with AI analytics platform Scrunch, analyzed 87 stories across 30 brands, queried 2,600+ prompts across 8 AI platforms over 30 days, and produced what GlobeNewswire (March 16, 2026) reported as a 239% median lift in AI citations when content moved through earned distribution channels versus brand-owned content alone.

The specific numbers from the study:

Noah Greenberg, CEO of Stacker, framed the finding plainly: "AI search isn't a single ranking position; it's a long tail played across platforms, prompt variations, and answer formats. Our data shows that coverage breadth is the new authority signal."

Coverage breadth is the phrase worth sitting with. AI engines do not have one index. Different engines pull from different sources on different query types. A brand that earned coverage in Forbes, TechCrunch, and Business Insider has a presence across the range of sources those engines pull from. A brand with strong owned content and minimal earned coverage has a presence on exactly the category of source AI engines de-prioritize.

Why AI engines are built this way (the mechanism, not just the data)

The bias is not accidental. It is structural and it follows from how large language models learn to attribute credibility.

AI engines were trained on the open web. The open web contains a small number of high-authority publication domains (Forbes, TechCrunch, Reuters, Financial Times, Harvard Business Review) and a very large number of brand sites, vendor blogs, and owned properties. During training, the models observed that authoritative third-party sources — the ones humans trusted, cited, linked to, and returned to — were a specific category: earned media in editorially independent publications.

Those training signals became citation signals. When an AI engine synthesizes an answer, it is not running a fresh relevance calculation from scratch. It is drawing on learned associations between source types and credibility. Independent publications with strong editorial standards = high credibility. Brand sites = self-advocacy. The inference engine applies that learned distinction at scale, every time someone asks a question about a vendor, category, or solution.

The Chen et al. paper specifically notes that "for popular entities, the model uses retrieved evidence primarily to reinforce pre-existing representations rather than to acquire new information." For niche brands and challenger companies, the model relies more heavily on what it can retrieve — which means earned media coverage in publications AI engines already trust. If that coverage does not exist, the brand is weak or absent in AI-generated answers.

Adding schema markup does not change this. Publishing more blog content does not change this. Hiring an SEO agency does not change this. The citation signal AI engines use comes from third-party editorial coverage — and that coverage has to be earned.

What Ahrefs found: brand mentions beat backlinks 3-to-1 for AI visibility

In a study of 75,000 brands published in May 2025, Ahrefs found that brand web mentions correlate three times more strongly with AI Overview visibility than backlinks — a correlation of 0.664 for brand mentions versus 0.218 for backlinks.

The top 25% of brands by web mentions earned ten times more AI Overview mentions than the next quartile. Brands in the bottom 50% for web mentions were essentially absent from AI-generated answers regardless of their traditional SEO performance.

Tim Soulo, CMO at Ahrefs, described the implication directly: "You just need to see where your competitors are mentioned, where you are mentioned, where your industry is mentioned. And you have to get mentions there — because then if the AI chatbot would do a search and find those pages and create their answer based on what they see on those pages, you will be mentioned."

The finding inverts the standard SEO logic. Traditional search engine optimization was about engineering specific pages for specific queries. AI citation optimization is about building a web of earned mentions across trusted third-party sources. The currency changed from links to mentions. The source type changed from any indexed page to editorially independent publications. The work required changed from content production to earned media placement.

The brands winning in AI search built earned authority first

The arXiv and Ahrefs data explains why the same brands appear consistently across AI-generated answers in competitive categories: they have earned coverage in the publications AI engines draw from. It is not a coincidence and it is not an algorithm exploit.

According to a Muck Rack analysis of over one million AI prompts, 85.5% of non-paid AI citations come from earned media sources. The Fullintel-University of Connecticut study presented at the Institute for Public Relations Research Conference in 2026 found that 47% of all AI citations in brand queries came from journalistic sources, with 89%+ of cited links coming from unpaid earned media.

These are not marginal findings. They describe a categorical rule: AI engines cite earned media because that is what AI engines were built to trust. The Fullintel-University of Connecticut academic study, presented at the Institute for Public Relations Research Conference in 2026, found that 47% of all AI citations in brand queries came from journalistic sources, with 95% of all cited links coming from unpaid earned media.

The brands winning in AI search understood this before the measurement studies arrived. They built earned coverage in the publications that AI engines index as authoritative sources. Their presence in AI-generated answers is downstream of that coverage — not downstream of their on-page optimization, their content volume, or their domain authority in the traditional SEO sense.

What the Gartner CMO survey reveals about the gap

In February 2026, Gartner published survey data from 402 senior marketing leaders that showed 65% of CMOs expect AI advances to dramatically change the marketing role in the next two years, yet only 32% believe significant skill changes are needed on their team.

That gap is the problem in data form. The CMOs who know the rules are changing are not updating their playbooks at the same pace. The implicit assumption is that the same investments that built brand authority in the traditional search era will transfer to the AI search era. The research says they do not.

Gartner's own analysts framed it directly: "CMOs must build the literacy to prioritize high-impact use cases, validate outputs and manage risk. Otherwise, AI becomes something happening around them, not led by them."

The brands that recognized the shift early are building what Machine Relations describes as Earned Authority — the first layer of the Machine Relations stack. Earned media placements in Tier 1 publications that AI engines already trust. That is the foundation. Without it, the layers above it (entity clarity, citation architecture, distribution across AI surfaces) have nothing authoritative to build on.

The owned content trap: why publishing more makes you invisible faster

There is a paradox in the data that marketers running content programs need to understand. Publishing more owned content does not improve AI citation rates. It may actually dilute them.

As more brands recognize that "content is how you show up in AI search," the volume of brand-owned content on the web is increasing. But AI engines were trained to treat brand-owned content as self-advocacy — and increasing the volume of self-advocacy does not change how that category is evaluated. It just adds more content to a category AI engines already discount.

The Chen et al. arXiv paper makes this explicit: social content is "almost absent from AI answers," and brand-owned content is consistently underweighted relative to earned sources across every vertical studied. Producing more of the underweighted content type is not a solution.

The correct correction is a category switch: from owned content that AI engines discount to earned coverage that AI engines trust. The distinction is not about writing better content. It is about where that content lives and who is vouching for it.

A piece of analysis published on a brand blog carries the authority of the brand that wrote it. The same analysis, covered by Forbes or placed as a contributed article in Harvard Business Review, carries the authority of the publication that independently decided to run it. AI engines read those signals differently. The placement matters more than the content.

What "coverage breadth" means for your AI visibility strategy

The Stacker research introduced "coverage breadth" as a metric — the percentage of relevant AI platforms where a brand surfaces consistently across prompt variations. The finding that earned distribution nearly tripled this metric (from 5.4% to 17.9%) has a direct strategic implication.

AI visibility is not a single-engine problem. ChatGPT, Perplexity, Gemini, Claude, and Google AI Overviews draw from overlapping but distinct source pools. The Yext analysis of 17.2 million distinct AI citations across six platforms from January 2026 found that Gemini favors first-party sites while Claude cites user-generated content at 2-4x higher rates. No single optimization strategy works uniformly across all engines.

Coverage breadth solves this. A brand with earned coverage in multiple high-authority publications has placed its authority signal across the range of sources AI engines pull from. Each publication that covers the brand is a node in that network. When different engines query different source pools for the same brand category, the coverage network ensures the brand is present in enough of those pools to surface consistently.

This is why Moz's 2026 analysis of 40,000 queries found that 88% of Google AI Mode citations are NOT in the organic search top 10. The sources AI engines trust and the sources traditional SEO optimizes for are mostly different. Coverage breadth is built by earning placements in the publications AI engines independently trust — not by optimizing the pages AI engines were going to deprioritize anyway.

What every PR practitioner and CMO is now being told from inside their own industry

The earned media bias is no longer only visible in academic research. It is showing up in practitioner conversations and industry reports from the comms and PR side simultaneously.

The WorldCom PR Group, a consortium of 160 independent PR agencies operating globally, stated in their 2025 AI visibility analysis: "Research shows that up to 90% of citations driving brand visibility in LLMs come from earned media, positioning public relations at the center of this transformation."

Brian Olson, brand PR lead at Hormel Foods, put a specific timeline on the shift in January 2026: "By the end of 2026, appearing in LLM responses will stand shoulder-to-shoulder with impressions, which continue to lose relevance as a primary KPI."

Gab Ferree, founder of Off the Record, said it plainly at an Axios HQ webinar in February 2026, as reported by Stacker: "Media relations are becoming machine relations. It's on the comms professionals to learn the patterns of AI and then take action on them."

What is notable about these quotes is their source. These are not AI visibility tool companies with a product to sell. These are PR practitioners and comms leaders describing a shift they are seeing from inside their work. They are arriving at the same conclusion the research data reached: earned media in trusted publications is what AI engines cite, and that makes earned media the infrastructure of AI-era brand visibility.

The Machine Relations framework: why earned media is Layer 1

The research above describes a pattern that Machine Relations — the discipline coined by Jaxon Parrott, founder of AuthorityTech, in 2024 — names and systematizes.

Machine Relations is the framework that defines how brands earn citations, recommendations, and visibility inside AI-driven discovery systems. Its first layer is Earned Authority: trusted third-party coverage in publications that AI systems already recognize as credible. That layer is foundational because everything else in the stack depends on it.

Entity clarity (Layer 2) requires that AI systems can unambiguously identify and validate a brand — and the strongest signals for that identification come from earned coverage in publications AI engines already trust, not from owned properties. Citation architecture (Layer 3) structures content for extraction, but AI engines are more likely to extract from third-party editorial sources than from brand sites even with perfect structure. Distribution (Layer 4, where GEO and AEO operate) amplifies presence across AI surfaces — but it amplifies a signal that either exists or does not. If earned authority is absent, distribution pushes nothing.

The brands invisible in AI-generated answers are not failing at Layer 3 or Layer 4. They are failing at Layer 1. Their SEO investment is strong. Their content program is running. Their entity signals are clean. And none of it moves the citation needle because AI engines are not looking at any of those things when they decide what to include in an answer. They are looking at earned authority in independently trusted sources.

As Jaxon Parrott wrote in his Machine Relations breakdown on Medium: "PR's original mechanism — earned media, direct editorial relationships, third-party credibility from real publications — is the exact mechanism AI engines use to decide what to cite. The publications haven't changed. What changed is the reader."

What to do about it

The research points to a specific correction, not a broad set of tactics. The correction is earned media coverage in publications AI engines trust, distributed across enough of those publications to build coverage breadth across AI platforms.

That work requires relationships, not just content. A placement in Forbes or TechCrunch is not produced by submitting a blog post to a contact form. It requires a direct relationship with an editor or journalist who trusts the source and believes the story serves their readership. Those relationships take time to build — or they can be accessed through an agency that already has them.

What it does not require is a more sophisticated content strategy for your owned website. The data on that is settled: AI engines systematically discount brand-owned content relative to earned sources, and that discount is structural, not addressable through optimization.

The specific investment that changes citation rates is earning coverage in the publications that AI engines already treat as authoritative. According to the Ahrefs analysis of ChatGPT's most-cited pages, 65.3% of cited pages come from domains with a domain rating of 80 or higher. The Princeton and Georgia Tech GEO study (Aggarwal et al., SIGKDD 2024) found that adding statistics to content improves AI visibility by 30-40%, and that citing credible external sources further increases citation probability — but these structural improvements are most effective when the content lives on domains AI engines already recognize as credible. Those are major publications, not long-tail content sites. Getting into those publications requires relationships — and once in, the placement creates the earned authority signal that AI engines read as credibility worth citing.

Start your visibility audit →

FAQ

Why do AI search engines prefer earned media over brand content?

AI engines were trained on the open web, where independent editorial publications represented the highest-credibility sources humans consistently cited and returned to. That training created a structural preference for third-party earned media over brand-owned content, which AI engines classify as self-advocacy. According to a September 2025 arXiv study (Chen et al.) of AI search behavior across multiple verticals, AI engines cite earned sources at 57–92% depending on query type, compared to Google's balanced 41–45% earned share. The bias is not intentional product design — it is a learned pattern from the training corpus that reflects how editorial authority was distributed across the open web.

Does my website's SEO performance affect how AI engines cite my brand?

Not significantly. A 2026 Moz analysis of 40,000 queries found that 88% of Google AI Mode citations are not in the organic search top 10. An Ahrefs analysis found only 12% overlap between AI Mode citations and top organic results. Traditional SEO performance and AI citation performance are largely independent. The Ahrefs 75,000-brand study found that brand web mentions (a proxy for earned media presence) correlate three times more strongly with AI visibility than backlinks (0.664 vs. 0.218). AI citation is driven by earned authority in trusted publications, not by technical SEO optimization.

Who coined Machine Relations?

Jaxon Parrott, founder of AuthorityTech, coined the term Machine Relations in 2024 to name the discipline of earning AI citations and recommendations for brands. He published the origin story and the five-layer Machine Relations stack at machinerelations.ai. AuthorityTech is the first AI-native Machine Relations agency and has secured over 10,000 AI-cited articles for clients including 27 unicorn startups.

How many AI citations come from earned media?

Multiple independent studies have now quantified this. A Muck Rack analysis of over one million AI prompts found that 85.5% of non-paid AI citations come from earned media sources. The Fullintel-University of Connecticut academic study found that 89%+ of AI-cited links were earned media. The WorldCom PR Group, representing 160 PR agencies globally, cited research showing up to 90% of AI brand citations come from earned media. The Stacker/Scrunch study of 87 stories across 30 brands found a 239% median lift in AI citations from earned distribution versus owned content alone.

What is coverage breadth and why does it matter for AI search?

Coverage breadth, introduced by Stacker's March 2026 GEO research, measures the percentage of relevant AI platforms where a brand surfaces consistently across prompt variations and query types. Different AI engines (ChatGPT, Perplexity, Gemini, Claude, Google AI Overviews) pull from overlapping but distinct source pools. A brand with earned coverage in multiple high-authority publications has placed its authority signal across more of those pools, making it more likely to surface across platforms. Stacker's research found that earned distribution increased cross-platform AI coverage from 5.4% to 17.9% at the median — nearly tripling coverage breadth.

What is the difference between earned media and owned content for AI visibility?

Earned media is coverage in third-party editorial publications that the brand did not pay for and did not control editorially. Owned content is anything on a brand's own website, blog, or social channels. AI engines were trained to treat these two categories differently. Third-party editorial coverage carries the implicit vouching of an independent editorial process. Owned content carries the brand's own assertion about itself. Multiple studies confirm AI engines cite the former at rates 2-5x higher than the latter in informational and consideration queries. The difference is not content quality — it is the source category.

Related Reading