How We Score Your AI Readiness: Full Methodology
Your AI Readiness Score is a composite metric (0–100) that measures how well your website is prepared to appear in AI-generated answers from ChatGPT, Perplexity, Gemini, Claude, and Google AI Overviews. Every point is computed by deterministic code — no AI guesswork, no black box.
1. How Your AI Readiness Score Is Calculated
Your overall score is a weighted average of six categories, plus a small E-E-A-T bonus (up to 5 points):
| Category | Weight | What It Measures |
|---|---|---|
| AI Crawler Access | 20% | Can AI platforms actually reach your content? |
| Structured Data & Schema | 10% | Does your site speak the machine-readable language AI systems understand? |
| Content AI-Citability | 25% | Is your content structured so AI can quote it accurately? |
| Technical SEO Foundations | 15% | Are the basics in place — meta tags, SSR, security, semantic HTML? |
| LLM Discoverability | 15% | Can large language models find, navigate, and understand your site? |
| Brand & Authority Signals | 10% | Does the web confirm you are a real, trustworthy entity? |
Content citability carries the highest weight because AI engines need quotable passages. Crawler access is the prerequisite — if bots cannot crawl your site, nothing else matters. Brand authority plays a supporting role since AI citation favors passage quality over domain reputation.
2. The 14 AI Crawlers We Check
We parse your robots.txt file (per RFC 9309) and check access rules for each of these crawlers individually:
| Crawler | Platform | What It Does | Why Blocking Hurts |
|---|---|---|---|
| GPTBot | ChatGPT / OpenAI | Indexes content for ChatGPT responses | Blocks you from the most-used AI assistant |
| OAI-SearchBot | OpenAI Search | Powers OpenAI's search feature | Removes you from OpenAI search results |
| ChatGPT-User | ChatGPT browsing | Real-time page fetching | Prevents live citation of your content |
| ClaudeBot | Anthropic Claude | Indexes content for Claude | Excludes you from Claude's knowledge base |
| anthropic-ai | Anthropic training | Training data collection | Prevents inclusion in future Claude models |
| PerplexityBot | Perplexity AI | AI search engine indexing | Removes you from a fast-growing AI search tool |
| Google-Extended | Gemini / Google AI | Training for Gemini and AI Overviews | Blocks Google's AI features (not traditional search) |
| GoogleOther | Google AI | Secondary Google AI crawler | Limits Google's AI-powered features |
| Bytespider | TikTok / ByteDance AI | ByteDance AI data collection | Blocks visibility in ByteDance's AI ecosystem |
| Applebot-Extended | Apple Intelligence | Apple's on-device AI features | Excludes you from Siri and Apple Intelligence |
| CCBot | Common Crawl | Open dataset used by many LLMs | Indirectly blocks training for multiple AI systems |
| cohere-ai | Cohere AI | Enterprise AI / RAG systems | Limits visibility in enterprise AI tools |
| Meta-ExternalAgent | Meta AI | Meta's AI across FB, Instagram, WhatsApp | Blocks visibility across Meta's 3B+ users |
| Amazonbot | Alexa / Amazon AI | Alexa answers and Amazon AI | Removes you from voice and Amazon AI search |
| FacebookBot | Meta / Facebook | Content indexing for Meta platforms | Limits content appearance on Meta platforms |
Each crawler receives one of five statuses: ALLOWED (explicitly permitted), BLOCKED (explicitly disallowed), PARTIALLY_BLOCKED (some paths restricted), BLOCKED_BY_WILDCARD (caught by a blanket User-agent: * / Disallow: / rule), or NOT_MENTIONED (no specific rule, defaults to allowed).
What You Should Do
Allow at minimum GPTBot, ClaudeBot, PerplexityBot, and Google-Extended — these cover the four most influential AI search platforms. If you have a blanket Disallow: / under User-agent: *, add explicit Allow: / rules for the crawlers you want to reach.
3. Content Citability Scoring
This is the most distinctive part of our analysis. We break your page into content blocks (grouped by heading), then score each block on five metrics that predict whether an AI system will select that passage as a citation.
The Five Metrics
Answer Block Quality (max 30 points) — Does the passage directly answer a question? We detect definition patterns ("X is a..."), early placement of key facts in the first 60 words, question-answer heading pairs, clear sentence structure (5–25 words per sentence), and attribution signals ("according to," "research shows").
Self-Containment (max 25 points) — Can the passage stand alone without context from surrounding paragraphs? We measure passage length (134–167 words is optimal), pronoun density (lower is better — AI strips context), and proper noun density (named entities help AI attribute correctly).
Structural Readability (max 20 points) — Is the passage easy for both humans and machines to parse? We check average sentence length (10–20 words is ideal), transition markers ("first," "additionally," "moreover"), numbered lists, and paragraph breaks.
Statistical Density (max 15 points) — Does the passage contain concrete data? We count percentages, dollar amounts, named quantities ("1,460 marketers"), year references, and citations to recognized sources (Gartner, Forrester, Google, etc.).
Uniqueness Signals (max 10 points) — Does the passage contain original insight? We detect first-party research language ("our research found," "we analyzed"), case study references, and specific tool/methodology mentions.
The Optimal Passage Length: 134–167 Words
Passages in the 134–167 word range are cited most often by AI systems. This length contains a complete answer with evidence, but avoids truncation. Under 80 words rarely has enough substance; over 250 words gets split or summarized, losing attribution.
Grade Distribution
| Grade | Score | Meaning |
|---|---|---|
| A | 80–100 | Highly Citable — AI systems will prefer this passage |
| B | 65–79 | Good Citability — solid, with room for improvement |
| C | 50–64 | Moderate Citability — needs more specificity or structure |
| D | 35–49 | Low Citability — vague, pronoun-heavy, or lacks data |
| F | 0–34 | Poor Citability — unlikely to be cited by AI |
Your overall citability score is the average across all passages, with bonuses for having multiple A/B passages and for maintaining optimal passage lengths.
4. Schema Validation
We parse all <script type="application/ld+json"> blocks on your page and check for the schema types that matter most for AI entity recognition, following the schema.org vocabulary.
Organization — The foundation of entity identity. Must include name, url, logo, and sameAs (links to social profiles, Wikipedia, Wikidata). Missing sameAs is the most common gap — it is how AI connects your site to your broader online presence.
Article / BlogPosting / NewsArticle — Signals authored content with a publication date. Must include headline, datePublished, and author. The optional speakable property (per Google Search Central) marks sections suitable for AI reading.
FAQPage — Directly maps question-answer pairs for AI extraction. One of the highest-impact schema types for AI visibility.
BreadcrumbList — Helps AI understand site hierarchy and generate accurate source attributions.
knowsAbout — An underused Organization/Person property that declares your expertise areas. AI systems use this for topical authority.
WebSite — Enables sitelinks search box and helps AI treat your site as a unified entity.
5. Technical Appendix
This section documents the exact scoring formulas implemented in analyzer.py. All scores are deterministic — running the same page twice will produce identical results.
Overall Score Formula
overall = (crawler_score * 0.20) + (schema_score * 0.10) + (citability_score * 0.25)
+ (tech_score * 0.15) + (llm_score * 0.15) + (brand_score * 0.10)
+ min(experience_signals * 1.5 + expertise_signals * 1.0, 5)
Result is clamped to [0, 100].
Category 1: AI Crawler Access (weight 20%)
- No robots.txt exists: base score = 60 (accessible but not ideal).
- robots.txt exists:
allowed_ratio = 1 - (blocked + partial * 0.5) / total_crawlers. Base =allowed_ratio * 70. - Bonuses: robots.txt exists (+10), sitemap referenced in robots (+10), llms.txt exists (+10), llms.txt valid format (+5).
- Penalty: -10 for each blocked key crawler (GPTBot, ClaudeBot, PerplexityBot, Google-Extended).
- llms.txt validation follows the llmstxt.org spec: must have
# Title,> Description,## Sectionheadings, and- [Link](url)entries.
Category 2: Structured Data & Schema (weight 10%)
Points are additive: Organization (+15), sameAs links 5+ (+15) / 2+ (+8) / 1+ (+4), Article (+10), FAQ (+8), Product (+8), BreadcrumbList (+5), WebSite (+5), speakable (+5), any JSON-LD present (+5). If no JSON-LD exists at all, score is forced to 0. Penalties: -5 per schema issue (up to -10 for 5+ issues), -5 if Organization exists but lacks knowsAbout.
Category 3: Content AI-Citability (weight 25%)
Base = average passage score across all content blocks. Bonuses: 3+ optimal-length passages (+10) or 1+ (+5); 3+ grade A/B passages (+10) or 1+ (+5); 3+ question headings (+5) or 1+ (+2). Penalty: word count under 300 (-20) or under 500 (-10).
Passage scoring (0–100 per block):
| Metric | Max | Key Signals |
|---|---|---|
| Answer Block Quality | 30 | Definition patterns (+15), facts in first 60 words (+15), question heading (+10), clear sentences (+10), attribution (+10) |
| Self-Containment | 25 | 134–167 words (+10), pronoun ratio < 2% (+8), 3+ proper nouns (+7) |
| Structural Readability | 20 | Avg sentence 10–20 words (+8), transitions (+4), numbered lists (+4), line breaks (+4) |
| Statistical Density | 15 | Percentages (+3 each, max 6), dollar amounts (+3 each, max 5), named quantities (+2 each, max 4), year refs (+2), source names (+2) |
| Uniqueness Signals | 10 | First-party research (+5), case studies (+3), named tools (+2) |
Category 4: Technical SEO Foundations (weight 15%)
Points: title (+8), meta description (+8), canonical (+5), viewport (+5), complete OG tags (+8), sitemap with lastmod (+8), semantic HTML (+5), <main> tag (+3), single H1 (+5), valid hierarchy (+3), image alt ratio (up to +5), SSR detected (+10), security headers (up to +8), lang (+2). Security: HTTPS (+4), HSTS (+2), CSP/X-Content-Type/X-Frame-Options/Referrer-Policy (+1 each).
Category 5: LLM Discoverability (weight 15%)
Points: llms.txt exists (+20), valid format (+15), llms-full.txt (+5), FAQ schema (+10), 5+ question headings (+10) / 2+ (+5), 3+ high answer-quality passages (+10) / 1+ (+5), Organization + meta description (+10), sitemap (+5), 20+ internal links (+5) / 10+ (+3), 1500+ words (+5) / 800+ (+3).
Category 6: Brand & Authority Signals (weight 10%)
Social platforms (capped at 40 total): YouTube (+15), Reddit (+12), LinkedIn (+10), Twitter/X (+8), GitHub (+8), Facebook (+5), Instagram (+5), TikTok (+3). YouTube carries the highest weight per the Ahrefs brand authority study (r = 0.737). Additional: Wikipedia (+15), Wikidata (+10), About/Contact pages (+5 each), author info (+5), sameAs 5+ (+10) / 2+ (+5), testimonials (+5), privacy policy (+3), terms (+2).
Edge Cases
- SPA / client-side rendering: Near-empty
#app,#root,__next, or__nuxtcontainers (< 50 chars) flag SSR failure. AI crawlers generally do not execute JavaScript. - Wildcard robots.txt blocks:
User-agent: * / Disallow: /blocks all crawlers including AI bots, flagged separately from per-crawler blocks. - No robots.txt: Scored as permissive (60) per RFC 9309 Section 2.3 — absence means all crawlers allowed.
- Schema in @graph: CMS platforms (WordPress/Yoast) nest schema in
@grapharrays. We parse these recursively. - Minimal content pages: Under 300 words receives a -20 citability penalty.
References
- Google Search Central: Structured Data — Schema markup guidelines and supported types.
- web.dev: Technical SEO — Semantic HTML and meta tag best practices.
- schema.org — Full vocabulary reference for Organization, Article, FAQPage, speakable, knowsAbout.
- llmstxt.org — Specification for the llms.txt standard.
- RFC 9309: Robots Exclusion Protocol — The authoritative standard for robots.txt parsing.
- GEO: Generative Engine Optimization — Princeton/Georgia Tech/IIT Delhi research on optimizing content for AI search engines.