Education

How ChatGPT Chooses Which Sources to Cite

Understand the signals AI models use to select and cite sources. Learn about structure, authority, data density, and freshness -- the factors that determine whether your content gets cited.

Published 2026-02-17|Tocho Team

AI Citation Is Not Random

When ChatGPT, Claude, Perplexity, or Google AI Overviews answer a question and cite a source, that citation is not random. These systems don't flip a coin between ten relevant pages. They evaluate content against specific criteria and choose the source that best satisfies those criteria.

Understanding what those criteria are gives you a direct path to becoming the cited source instead of the ignored one.

The Four Pillars of AI Source Selection

Based on analysis of thousands of AI-generated citations across ChatGPT, Claude, and Perplexity, four primary factors consistently determine which sources get cited.

1. Structural Clarity

AI models process content structurally. A page with clear heading hierarchy (H1 > H2 > H3), logical section breaks, and organised information is dramatically easier for a model to parse and extract from.

What AI looks for:

A single clear H1 that establishes the page's topic
H2 sections that each cover a distinct subtopic
Lists and tables that organise comparable data
Short paragraphs (3-5 sentences) rather than walls of text
FAQ sections that directly answer common questions

The data: According to the GEO research (Aggarwal et al., 2023), structural improvements alone can increase AI citation rates by up to 40%. Pages with clear heading hierarchies are cited 2.1x more often than unstructured pages covering the same topic.

2. Authority Signals

AI models have learned to evaluate credibility. Content that demonstrates expertise through specific citations, data points, and named sources is significantly more likely to be cited than content that makes vague claims.

What AI looks for:

Specific statistics with sources ("According to a 2025 report by the Conference Board of Canada, 64% of Canadian firms...")
Named authors or organisations
External links to credible references
Academic citations or research references (e.g., studies from the University of Toronto, McGill, or UBC)
Industry-specific expertise signals

The data: The Aggarwal et al. study found that adding citations and statistics produced the largest improvement in AI visibility -- up to 115% increase. This is the single most impactful optimisation you can make.

What does NOT work:

"Experts say..." without naming the experts
"Studies show..." without citing the studies
"It's well known that..." -- no, it isn't, cite it

3. Extractability

The most critical factor may be extractability -- whether your content contains self-contained statements that an AI can confidently pull out and present as part of an answer.

What makes content extractable:

Direct answer patterns: Sentences that begin with "X is..." or "The key factors are..." or "There are three main approaches..."
Quotable density: Multiple clear, factual statements per section rather than one long flowing argument
Definition patterns: "GEO, or Generative Engine Optimisation, is the practice of..."
Structured data: Lists of features, comparison tables, step-by-step processes

What makes content NOT extractable:

Long, flowing paragraphs that never arrive at a clear statement
Sentences that require 3 paragraphs of context to understand
Marketing language that prioritises persuasion over information
Content that talks around a topic without directly addressing it

4. Freshness

AI assistants have a strong recency bias. When multiple sources cover the same topic, the one with clear freshness signals wins.

What AI looks for:

Publication dates (visible on the page)
"Last updated: [date]" notices
Current-year references in the content
Temporal language: "As of 2026," "In Q1 2026," "The latest data shows..."
Updated statistics (2025-2026 data preferred over 2022 or older)

What hurts you:

No publication date anywhere on the page
Statistics from 3+ years ago with no update
Timeless language that gives no indication when the content was written
"Best practices for 2022" in the title of a page last updated in 2022

How Different AI Systems Differ

While the four pillars apply broadly, different AI systems have different priorities:

ChatGPT (OpenAI)

ChatGPT tends to synthesise information from multiple sources rather than quoting directly. It values:

Comprehensive coverage of a topic
Clear factual statements it can reformulate
Authoritative sources it can reference by name
Recent information with clear dates

Claude (Anthropic)

Claude places heavy emphasis on:

Nuanced, balanced coverage (not promotional content)
Clear citations and evidence
Well-structured arguments with logical flow
Factual accuracy and specificity

Perplexity

Perplexity is the most citation-heavy AI assistant, providing inline source links for nearly every claim. It values:

Specific, quotable statements
Clear data points it can attribute
Content that directly answers the query
Multiple corroborating sources

Google AI Overviews

Google AI Overviews draw heavily from content that already performs well in traditional search, but adds:

Preference for structured data (JSON-LD)
Strong emphasis on E-E-A-T signals
Content freshness as a ranking factor
Semantic HTML markup

What This Means for Canadian Content Creators

For Canadian businesses, there's an additional dimension worth considering. AI systems don't just analyse content quality in a vacuum -- they also weigh relevance to the user's context. A user in Toronto asking about tax planning, real estate trends, or technology hiring will receive different AI-curated answers than a user in San Francisco. This means Canadian content that speaks to the Canadian landscape -- referencing the CRA instead of the IRS, citing StatsCan data, mentioning provincial regulations, or discussing the Canadian market specifically -- has a genuine advantage for Canadian queries.

Canadian organisations like the University of Waterloo's AI Institute, the Vector Institute in Toronto, and the Mila institute in Montreal are producing world-class research. Citing these institutions where relevant adds both authority and geographic specificity that AI models can leverage when matching content to Canadian users.

The bilingual nature of the Canadian market also presents an opportunity. Content that is well-optimised in Canadian English (distinct from American English in spelling, terminology, and cultural references) signals to AI systems that it is specifically tailored for a Canadian audience.

Practical Checklist: Getting Cited

Here's a concrete checklist to increase your chances of being cited by AI assistants:

Structure every page with clear headings. H1 for the topic, H2 for subtopics, H3 for details within subtopics.
Include at least 3 quotable statements per page. Self-contained sentences that directly answer a likely question.
Add specific data points. Replace "many companies" with "78% of Canadian enterprises." Replace "growing rapidly" with "growing at 23% year-over-year since 2024."
Cite named sources. Replace "a recent study" with "a 2025 study by researchers at the University of Toronto" or "a 2025 report from the Conference Board of Canada."
Add a visible date. Publication date and "last updated" timestamp.
Include an FAQ section. Direct question-and-answer format is the most extractable content format.
Use JSON-LD structured data. Article schema for blog posts, FAQPage schema for FAQ sections.
Write for clarity, not persuasion. Remove marketing superlatives. Replace "revolutionary" with specific benefits. AI prefers information over promotion.

The Bottom Line

AI citation is not a black box. The signals that determine whether your content gets cited are knowable, measurable, and actionable.

The content that gets cited is the content that is well-structured, authoritative, extractable, and fresh. These are not subjective qualities -- they can be measured and improved systematically.

The question is not whether you should optimise for AI citations. The question is whether you'll do it before your competitors.

References:

Aggarwal, P., Murahari, V., et al. (2023). "GEO: Generative Engine Optimization." arXiv:2311.09735.
Tocho internal analysis of 10,000+ AI citation patterns (2026).

AI Citation Is Not Random

The Four Pillars of AI Source Selection

1. Structural Clarity

2. Authority Signals

3. Extractability

4. Freshness

How Different AI Systems Differ

ChatGPT (OpenAI)

Claude (Anthropic)

Perplexity

Google AI Overviews

What This Means for Canadian Content Creators

Practical Checklist: Getting Cited

The Bottom Line

Ready to optimise your content?