How We Score Your AI Readiness: Full Methodology

Your AI Readiness Score is a composite metric (0–100) that measures how well your website is prepared to appear in AI-generated answers from ChatGPT, Perplexity, Gemini, Claude, and Google AI Overviews. Every point is computed by deterministic code — no AI guesswork, no black box.

1. How Your AI Readiness Score Is Calculated

Your overall score is a weighted average of six categories, plus a small E-E-A-T bonus (up to 5 points):

CategoryWeightWhat It Measures
AI Crawler Access20%Can AI platforms actually reach your content?
Structured Data & Schema10%Does your site speak the machine-readable language AI systems understand?
Content AI-Citability25%Is your content structured so AI can quote it accurately?
Technical SEO Foundations15%Are the basics in place — meta tags, SSR, security, semantic HTML?
LLM Discoverability15%Can large language models find, navigate, and understand your site?
Brand & Authority Signals10%Does the web confirm you are a real, trustworthy entity?

Content citability carries the highest weight because AI engines need quotable passages. Crawler access is the prerequisite — if bots cannot crawl your site, nothing else matters. Brand authority plays a supporting role since AI citation favors passage quality over domain reputation.

2. The 14 AI Crawlers We Check

We parse your robots.txt file (per RFC 9309) and check access rules for each of these crawlers individually:

CrawlerPlatformWhat It DoesWhy Blocking Hurts
GPTBotChatGPT / OpenAIIndexes content for ChatGPT responsesBlocks you from the most-used AI assistant
OAI-SearchBotOpenAI SearchPowers OpenAI's search featureRemoves you from OpenAI search results
ChatGPT-UserChatGPT browsingReal-time page fetchingPrevents live citation of your content
ClaudeBotAnthropic ClaudeIndexes content for ClaudeExcludes you from Claude's knowledge base
anthropic-aiAnthropic trainingTraining data collectionPrevents inclusion in future Claude models
PerplexityBotPerplexity AIAI search engine indexingRemoves you from a fast-growing AI search tool
Google-ExtendedGemini / Google AITraining for Gemini and AI OverviewsBlocks Google's AI features (not traditional search)
GoogleOtherGoogle AISecondary Google AI crawlerLimits Google's AI-powered features
BytespiderTikTok / ByteDance AIByteDance AI data collectionBlocks visibility in ByteDance's AI ecosystem
Applebot-ExtendedApple IntelligenceApple's on-device AI featuresExcludes you from Siri and Apple Intelligence
CCBotCommon CrawlOpen dataset used by many LLMsIndirectly blocks training for multiple AI systems
cohere-aiCohere AIEnterprise AI / RAG systemsLimits visibility in enterprise AI tools
Meta-ExternalAgentMeta AIMeta's AI across FB, Instagram, WhatsAppBlocks visibility across Meta's 3B+ users
AmazonbotAlexa / Amazon AIAlexa answers and Amazon AIRemoves you from voice and Amazon AI search
FacebookBotMeta / FacebookContent indexing for Meta platformsLimits content appearance on Meta platforms

Each crawler receives one of five statuses: ALLOWED (explicitly permitted), BLOCKED (explicitly disallowed), PARTIALLY_BLOCKED (some paths restricted), BLOCKED_BY_WILDCARD (caught by a blanket User-agent: * / Disallow: / rule), or NOT_MENTIONED (no specific rule, defaults to allowed).

What You Should Do

Allow at minimum GPTBot, ClaudeBot, PerplexityBot, and Google-Extended — these cover the four most influential AI search platforms. If you have a blanket Disallow: / under User-agent: *, add explicit Allow: / rules for the crawlers you want to reach.

3. Content Citability Scoring

This is the most distinctive part of our analysis. We break your page into content blocks (grouped by heading), then score each block on five metrics that predict whether an AI system will select that passage as a citation.

The Five Metrics

Answer Block Quality (max 30 points) — Does the passage directly answer a question? We detect definition patterns ("X is a..."), early placement of key facts in the first 60 words, question-answer heading pairs, clear sentence structure (5–25 words per sentence), and attribution signals ("according to," "research shows").

Good: "Generative Engine Optimization (GEO) is the practice of structuring website content so AI search engines can accurately cite it in their responses. According to research from Princeton and Georgia Tech, sites that implement GEO see up to 40% more visibility in AI-generated answers."
Bad: "We have been doing this for a long time and our approach is really comprehensive and covers everything you need."

Self-Containment (max 25 points) — Can the passage stand alone without context from surrounding paragraphs? We measure passage length (134–167 words is optimal), pronoun density (lower is better — AI strips context), and proper noun density (named entities help AI attribute correctly).

Good: "HubSpot's 2024 State of Marketing Report found that 64% of marketers already use AI tools for content creation. The study surveyed 1,460 B2B and B2C marketers across North America, Europe, and Asia-Pacific."
Bad: "They found that most of them already use it. This is higher than the previous year."

Structural Readability (max 20 points) — Is the passage easy for both humans and machines to parse? We check average sentence length (10–20 words is ideal), transition markers ("first," "additionally," "moreover"), numbered lists, and paragraph breaks.

Statistical Density (max 15 points) — Does the passage contain concrete data? We count percentages, dollar amounts, named quantities ("1,460 marketers"), year references, and citations to recognized sources (Gartner, Forrester, Google, etc.).

Uniqueness Signals (max 10 points) — Does the passage contain original insight? We detect first-party research language ("our research found," "we analyzed"), case study references, and specific tool/methodology mentions.

The Optimal Passage Length: 134–167 Words

Passages in the 134–167 word range are cited most often by AI systems. This length contains a complete answer with evidence, but avoids truncation. Under 80 words rarely has enough substance; over 250 words gets split or summarized, losing attribution.

Grade Distribution

GradeScoreMeaning
A80–100Highly Citable — AI systems will prefer this passage
B65–79Good Citability — solid, with room for improvement
C50–64Moderate Citability — needs more specificity or structure
D35–49Low Citability — vague, pronoun-heavy, or lacks data
F0–34Poor Citability — unlikely to be cited by AI

Your overall citability score is the average across all passages, with bonuses for having multiple A/B passages and for maintaining optimal passage lengths.

4. Schema Validation

We parse all <script type="application/ld+json"> blocks on your page and check for the schema types that matter most for AI entity recognition, following the schema.org vocabulary.

Organization — The foundation of entity identity. Must include name, url, logo, and sameAs (links to social profiles, Wikipedia, Wikidata). Missing sameAs is the most common gap — it is how AI connects your site to your broader online presence.

Article / BlogPosting / NewsArticle — Signals authored content with a publication date. Must include headline, datePublished, and author. The optional speakable property (per Google Search Central) marks sections suitable for AI reading.

FAQPage — Directly maps question-answer pairs for AI extraction. One of the highest-impact schema types for AI visibility.

BreadcrumbList — Helps AI understand site hierarchy and generate accurate source attributions.

knowsAbout — An underused Organization/Person property that declares your expertise areas. AI systems use this for topical authority.

WebSite — Enables sitelinks search box and helps AI treat your site as a unified entity.

5. Technical Appendix

This section documents the exact scoring formulas implemented in analyzer.py. All scores are deterministic — running the same page twice will produce identical results.

Overall Score Formula

overall = (crawler_score * 0.20) + (schema_score * 0.10) + (citability_score * 0.25)
        + (tech_score * 0.15) + (llm_score * 0.15) + (brand_score * 0.10)
        + min(experience_signals * 1.5 + expertise_signals * 1.0, 5)

Result is clamped to [0, 100].

Category 1: AI Crawler Access (weight 20%)

Category 2: Structured Data & Schema (weight 10%)

Points are additive: Organization (+15), sameAs links 5+ (+15) / 2+ (+8) / 1+ (+4), Article (+10), FAQ (+8), Product (+8), BreadcrumbList (+5), WebSite (+5), speakable (+5), any JSON-LD present (+5). If no JSON-LD exists at all, score is forced to 0. Penalties: -5 per schema issue (up to -10 for 5+ issues), -5 if Organization exists but lacks knowsAbout.

Category 3: Content AI-Citability (weight 25%)

Base = average passage score across all content blocks. Bonuses: 3+ optimal-length passages (+10) or 1+ (+5); 3+ grade A/B passages (+10) or 1+ (+5); 3+ question headings (+5) or 1+ (+2). Penalty: word count under 300 (-20) or under 500 (-10).

Passage scoring (0–100 per block):

MetricMaxKey Signals
Answer Block Quality30Definition patterns (+15), facts in first 60 words (+15), question heading (+10), clear sentences (+10), attribution (+10)
Self-Containment25134–167 words (+10), pronoun ratio < 2% (+8), 3+ proper nouns (+7)
Structural Readability20Avg sentence 10–20 words (+8), transitions (+4), numbered lists (+4), line breaks (+4)
Statistical Density15Percentages (+3 each, max 6), dollar amounts (+3 each, max 5), named quantities (+2 each, max 4), year refs (+2), source names (+2)
Uniqueness Signals10First-party research (+5), case studies (+3), named tools (+2)

Category 4: Technical SEO Foundations (weight 15%)

Points: title (+8), meta description (+8), canonical (+5), viewport (+5), complete OG tags (+8), sitemap with lastmod (+8), semantic HTML (+5), <main> tag (+3), single H1 (+5), valid hierarchy (+3), image alt ratio (up to +5), SSR detected (+10), security headers (up to +8), lang (+2). Security: HTTPS (+4), HSTS (+2), CSP/X-Content-Type/X-Frame-Options/Referrer-Policy (+1 each).

Category 5: LLM Discoverability (weight 15%)

Points: llms.txt exists (+20), valid format (+15), llms-full.txt (+5), FAQ schema (+10), 5+ question headings (+10) / 2+ (+5), 3+ high answer-quality passages (+10) / 1+ (+5), Organization + meta description (+10), sitemap (+5), 20+ internal links (+5) / 10+ (+3), 1500+ words (+5) / 800+ (+3).

Category 6: Brand & Authority Signals (weight 10%)

Social platforms (capped at 40 total): YouTube (+15), Reddit (+12), LinkedIn (+10), Twitter/X (+8), GitHub (+8), Facebook (+5), Instagram (+5), TikTok (+3). YouTube carries the highest weight per the Ahrefs brand authority study (r = 0.737). Additional: Wikipedia (+15), Wikidata (+10), About/Contact pages (+5 each), author info (+5), sameAs 5+ (+10) / 2+ (+5), testimonials (+5), privacy policy (+3), terms (+2).

Edge Cases

References