How ChatGPT Chooses Which Sources to Cite
Understand the signals AI models use to select and cite sources. Learn about structure, authority, data density, and freshness -- the factors that determine whether your content gets cited.
AI Citation Is Not Random
When ChatGPT, Claude, Perplexity, or Google AI Overviews answer a question and cite a source, that citation is not random. These systems don't flip a coin between ten relevant pages. They evaluate content against specific criteria and choose the source that best satisfies those criteria.
Understanding what those criteria are gives you a direct path to becoming the cited source instead of the ignored one.
The Four Pillars of AI Source Selection
Based on analysis of thousands of AI-generated citations across ChatGPT, Claude, and Perplexity, four primary factors consistently determine which sources get cited.
1. Structural Clarity
AI models process content structurally. A page with clear heading hierarchy (H1 > H2 > H3), logical section breaks, and organised information is dramatically easier for a model to parse and extract from.
What AI looks for:
- A single clear H1 that establishes the page's topic
- H2 sections that each cover a distinct subtopic
- Lists and tables that organise comparable data
- Short paragraphs (3-5 sentences) rather than walls of text
- FAQ sections that directly answer common questions
The data: According to the GEO research (Aggarwal et al., 2023), structural improvements alone can increase AI citation rates by up to 40%. Pages with clear heading hierarchies are cited 2.1x more often than unstructured pages covering the same topic.
2. Authority Signals
AI models have learned to evaluate credibility. Content that demonstrates expertise through specific citations, data points, and named sources is significantly more likely to be cited than content that makes vague claims.
What AI looks for:
- Specific statistics with sources ("According to a 2025 report by the Conference Board of Canada, 64% of Canadian firms...")
- Named authors or organisations
- External links to credible references
- Academic citations or research references (e.g., studies from the University of Toronto, McGill, or UBC)
- Industry-specific expertise signals
The data: The Aggarwal et al. study found that adding citations and statistics produced the largest improvement in AI visibility -- up to 115% increase. This is the single most impactful optimisation you can make.
What does NOT work:
- "Experts say..." without naming the experts
- "Studies show..." without citing the studies
- "It's well known that..." -- no, it isn't, cite it
3. Extractability
The most critical factor may be extractability -- whether your content contains self-contained statements that an AI can confidently pull out and present as part of an answer.
What makes content extractable:
- Direct answer patterns: Sentences that begin with "X is..." or "The key factors are..." or "There are three main approaches..."
- Quotable density: Multiple clear, factual statements per section rather than one long flowing argument
- Definition patterns: "GEO, or Generative Engine Optimisation, is the practice of..."
- Structured data: Lists of features, comparison tables, step-by-step processes
What makes content NOT extractable:
- Long, flowing paragraphs that never arrive at a clear statement
- Sentences that require 3 paragraphs of context to understand
- Marketing language that prioritises persuasion over information
- Content that talks around a topic without directly addressing it
4. Freshness
AI assistants have a strong recency bias. When multiple sources cover the same topic, the one with clear freshness signals wins.
What AI looks for:
- Publication dates (visible on the page)
- "Last updated: [date]" notices
- Current-year references in the content
- Temporal language: "As of 2026," "In Q1 2026," "The latest data shows..."
- Updated statistics (2025-2026 data preferred over 2022 or older)
What hurts you:
- No publication date anywhere on the page
- Statistics from 3+ years ago with no update
- Timeless language that gives no indication when the content was written
- "Best practices for 2022" in the title of a page last updated in 2022
How Different AI Systems Differ
While the four pillars apply broadly, different AI systems have different priorities:
ChatGPT (OpenAI)
ChatGPT tends to synthesise information from multiple sources rather than quoting directly. It values:
- Comprehensive coverage of a topic
- Clear factual statements it can reformulate
- Authoritative sources it can reference by name
- Recent information with clear dates
Claude (Anthropic)
Claude places heavy emphasis on:
- Nuanced, balanced coverage (not promotional content)
- Clear citations and evidence
- Well-structured arguments with logical flow
- Factual accuracy and specificity
Perplexity
Perplexity is the most citation-heavy AI assistant, providing inline source links for nearly every claim. It values:
- Specific, quotable statements
- Clear data points it can attribute
- Content that directly answers the query
- Multiple corroborating sources
Google AI Overviews
Google AI Overviews draw heavily from content that already performs well in traditional search, but adds:
- Preference for structured data (JSON-LD)
- Strong emphasis on E-E-A-T signals
- Content freshness as a ranking factor
- Semantic HTML markup
What This Means for Canadian Content Creators
For Canadian businesses, there's an additional dimension worth considering. AI systems don't just analyse content quality in a vacuum -- they also weigh relevance to the user's context. A user in Toronto asking about tax planning, real estate trends, or technology hiring will receive different AI-curated answers than a user in San Francisco. This means Canadian content that speaks to the Canadian landscape -- referencing the CRA instead of the IRS, citing StatsCan data, mentioning provincial regulations, or discussing the Canadian market specifically -- has a genuine advantage for Canadian queries.
Canadian organisations like the University of Waterloo's AI Institute, the Vector Institute in Toronto, and the Mila institute in Montreal are producing world-class research. Citing these institutions where relevant adds both authority and geographic specificity that AI models can leverage when matching content to Canadian users.
The bilingual nature of the Canadian market also presents an opportunity. Content that is well-optimised in Canadian English (distinct from American English in spelling, terminology, and cultural references) signals to AI systems that it is specifically tailored for a Canadian audience.
Practical Checklist: Getting Cited
Here's a concrete checklist to increase your chances of being cited by AI assistants:
-
Structure every page with clear headings. H1 for the topic, H2 for subtopics, H3 for details within subtopics.
-
Include at least 3 quotable statements per page. Self-contained sentences that directly answer a likely question.
-
Add specific data points. Replace "many companies" with "78% of Canadian enterprises." Replace "growing rapidly" with "growing at 23% year-over-year since 2024."
-
Cite named sources. Replace "a recent study" with "a 2025 study by researchers at the University of Toronto" or "a 2025 report from the Conference Board of Canada."
-
Add a visible date. Publication date and "last updated" timestamp.
-
Include an FAQ section. Direct question-and-answer format is the most extractable content format.
-
Use JSON-LD structured data. Article schema for blog posts, FAQPage schema for FAQ sections.
-
Write for clarity, not persuasion. Remove marketing superlatives. Replace "revolutionary" with specific benefits. AI prefers information over promotion.
The Bottom Line
AI citation is not a black box. The signals that determine whether your content gets cited are knowable, measurable, and actionable.
The content that gets cited is the content that is well-structured, authoritative, extractable, and fresh. These are not subjective qualities -- they can be measured and improved systematically.
The question is not whether you should optimise for AI citations. The question is whether you'll do it before your competitors.
References:
- Aggarwal, P., Murahari, V., et al. (2023). "GEO: Generative Engine Optimization." arXiv:2311.09735.
- Tocho internal analysis of 10,000+ AI citation patterns (2026).