A new file is quietly appearing at the root of developer documentation sites, enterprise knowledge bases, and AI-forward marketing sites: `/llms.txt`. It is a proposed standard for telling large language models which parts of your site matter most, how to navigate them, and how to cite them accurately.
The proposal came from Jeremy Howard of Answer.AI in September 2024, modeled loosely on `robots.txt` but designed for a completely different audience — the retrieval pipelines behind ChatGPT, Claude, Perplexity, and Gemini. In the eighteen months since, adoption has grown from a handful of AI-native companies to hundreds of developer-tool firms, documentation platforms, and enterprise SaaS brands.
This guide explains what `llms.txt` is, how it differs from the other machine-readable standards you already know, whether you should add it now, and how to write one that actually drives citations.
What is llms.txt?
llms.txt is a proposed web standard published in September 2024 by Jeremy Howard of Answer.AI that provides large language models with a curated, markdown-formatted index of a site's most important content. It lives at the root URL `/llms.txt` and uses a specific format: H1 site name, blockquote summary, and sectioned lists of links with short descriptions. It is designed to help AI systems find and cite authoritative content efficiently.
The `llms.txt` file is a plain-text markdown document placed at the root of a domain at `https://yoursite.com/llms.txt`. Its purpose is to give AI retrieval pipelines a hand-curated map of your highest-value content — the pages you actually want cited when someone asks ChatGPT or Perplexity a question about your brand, product, or domain expertise.
Unlike a sitemap, it is written for machines that read prose rather than for crawlers that follow links. Unlike robots.txt, it does not grant or deny access — it recommends priority. And unlike schema.org markup, it sits outside individual pages as a single domain-wide index.
The proposal sits at llmstxt.org, where Howard published the original specification and continues to refine it with community feedback. By April 2026, the format has been adopted by Anthropic, Stripe, Zapier, Cloudflare, and a long list of developer-tool companies, even though no AI platform has officially committed to reading it as a first-class input.
Why a new standard was needed
A new standard was needed because existing web standards do not solve the specific problem of telling LLMs which content to cite. Robots.txt controls crawl permission, sitemap.xml provides a complete URL index for search engines, and schema.org adds per-page metadata. None of them provide a hand-curated, priority-ordered guide to the most citation-worthy content on a site, which is what LLM retrieval pipelines actually need.
LLM retrieval pipelines face a problem search engines never had to solve at the same scale: context windows. A retriever pulling content for a ChatGPT response cannot ingest an entire documentation site — it has a few thousand tokens to spend per query, and every wasted token on a navigation header or cookie banner is a token not spent on a usable answer.
Traditional sitemaps list every URL without ranking, annotation, or cleaned content. Schema.org adds per-page metadata but requires the AI to crawl each page individually. Robots.txt tells crawlers what they can access but says nothing about what matters.
The `llms.txt` proposal fills this gap with three properties: it is a single file, it is hand-curated for priority, and it is written in markdown that LLMs parse natively. That combination makes it dramatically cheaper for a retrieval pipeline to understand a site than reading a sitemap and fetching every page.
How llms.txt compares to existing standards
llms.txt differs from other machine-readable standards in audience and purpose: robots.txt sets crawler access rules for all bots, sitemap.xml provides a complete URL inventory for search indexing, schema.org adds structured JSON-LD metadata per page, and llms.txt provides a curated priority index specifically for LLM retrieval. The four standards complement rather than replace each other, and a mature AEO program uses all of them.
The four standards solve different problems. Understanding how they fit together is essential before investing in any of them.
| Standard | Location | Format | Audience | Purpose |
|---|---|---|---|---|
| robots.txt | /robots.txt | Plain text directives | All web crawlers | Allow/disallow crawl paths |
| sitemap.xml | /sitemap.xml | XML | Search engines | Complete URL index |
| schema.org | Inline JSON-LD | JSON-LD in HTML | Search + AI engines | Per-page structured metadata |
| llms.txt | /llms.txt | Markdown | LLM retrievers | Curated priority index |
For a deeper breakdown of how schema for AI citations works alongside these other layers, that guide walks through the JSON-LD patterns that consistently drive citations. Both schema and `llms.txt` are part of the same broader discipline of making a site machine-legible.
The llms.txt format specification
The llms.txt format specification requires a specific markdown structure: an H1 containing the site or project name, a blockquote summary describing the site's purpose, optional prose context, and H2 sections containing markdown bullet lists of links where each link has a short description after a colon. The file must be valid markdown and parseable by standard markdown libraries without custom extensions. Deviating from this structure reduces LLM parseability.
The structure is deliberately simple. Howard designed it so any LLM that already parses markdown — which is essentially all of them — can read it without custom tokenization.
Here is the canonical structure:
```markdown
# Project Name
Optional paragraphs of additional context. These can explain the scope of the project, important conventions, or anything else that helps an LLM understand the content it is about to index.A one-to-three sentence summary describing what this site is and who it serves.
Docs
- Getting Started: Installation and first steps
- API Reference: Complete API documentation
- Guides: Step-by-step tutorials
Examples
- Example Apps: Production-quality sample applications
- Code Snippets: Short, focused code examples
Optional
```Three structural rules matter. The H1 must be the site or project name, not a marketing tagline. The blockquote must be a genuine summary, not a pitch. And each link description should be a concise noun phrase explaining what the linked content contains.
The `## Optional` section carries a specific meaning in the spec: these are lower-priority resources that an LLM with limited context should deprioritize or skip. Anything not in `## Optional` is treated as recommended reading.
Full /llms.txt vs /llms-full.txt
The /llms.txt file is a compact navigation index for LLMs, while /llms-full.txt is an expanded version containing the complete cleaned text content of the documents referenced in /llms.txt. The expanded version lets an LLM ingest the full body of a site's knowledge in a single request without having to crawl individual pages, which is useful for large documentation libraries. Sites with significant content usually publish both.
The proposal includes a companion file at `/llms-full.txt` that holds the actual content of the pages referenced in `/llms.txt`. Where `/llms.txt` is an index, `/llms-full.txt` is the complete text, stripped of navigation chrome and formatted as clean markdown.
The motivation is practical. A retrieval pipeline fetching `/llms.txt` gets a map. A pipeline fetching `/llms-full.txt` gets the entire map plus the contents of every destination, which means it can answer questions without making a second round of requests.
Companies publishing both files treat them as layered: `/llms.txt` is the catalog and `/llms-full.txt` is the bundled content. Large documentation sets often produce `/llms-full.txt` files of several hundred thousand tokens, which is still small enough for modern long-context models to ingest in a single request.
A real example: Anthropic's llms.txt
Here is a simplified representation of the structure that Anthropic and other AI-forward companies have published:
```markdown
# Anthropic
Anthropic is an AI safety company. We build Claude, a family of large language models, and publish research on AI safety, alignment, and interpretability.
Documentation
- Getting Started with Claude: Introduction to the Claude API
- API Reference: Complete API endpoints and parameters
- Prompt Engineering Guide: Best practices for prompting Claude
- Tool Use: How Claude uses external tools
Models
- Claude Models Overview: Available models and their capabilities
- Pricing: Current pricing for Claude models
Research
- Research Publications: Peer-reviewed papers and safety work
- Responsible Scaling Policy: Our framework for responsible AI development
Optional
- Company News: Announcements and updates
- Careers: Open positions
Three patterns are worth noting. The blockquote summary is factual and succinct — it tells an LLM exactly what to expect. Each section groups related resources. And the `## Optional` bucket correctly houses promotional pages that should not displace documentation in a context-constrained retrieval.
How major AI platforms are treating llms.txt
As of April 2026, no major AI platform has officially committed to reading llms.txt as a first-class input, but community-built tools and wrappers increasingly use it, and retrieval pipelines from Anthropic, OpenAI, and Perplexity can be prompted to fetch it. The standard is in the adoption phase typical of early web standards, where sites publish first and platform support follows once adoption crosses a threshold. Sites that publish now are forward-compatible rather than currently-rewarded.
The honest answer is that llms.txt is not yet a universally consumed standard. Anthropic, OpenAI, and Perplexity have not published statements committing to read it automatically. Their current retrieval pipelines rely primarily on web search, schema.org, and direct page fetches.
What is happening in practice is more nuanced. Developer-facing AI tools — Cursor, Continue, Aider, and various RAG frameworks — do read llms.txt when it is present. Anthropic's Claude and OpenAI's ChatGPT can be prompted to fetch `/llms.txt` for a domain and will do so reliably when asked. Perplexity's retrieval has been observed citing llms.txt-indexed pages at higher rates than non-indexed pages for the same queries, though the company has not formally announced this behavior.
The trajectory looks like most early web standards. Sites adopt first. Platforms add opportunistic support once adoption crosses a threshold. Eventually a tipping point arrives and the standard becomes table stakes.
Adoption as of April 2026
Adoption has moved from the edges to the center of developer tooling. The companies that have published `/llms.txt` or `/llms-full.txt` files include:
- Anthropic
- Stripe
- Zapier
- Cloudflare
- Vercel
- Supabase
- Resend
- Clerk
- Prisma
- Turso
- Fly.io
- Mintlify (which auto-generates them for every customer doc site)
Outside dev tools, adoption is spreading among SaaS companies with complex product documentation, AI-native startups building their own retrieval infrastructure, and a growing minority of marketing-led brands that see llms.txt as a signal of AI-readiness to their customers.
When llms.txt helps (and when it does not)
llms.txt helps sites with complex documentation, deep content libraries, extensive API references, or multi-product knowledge bases where a curated priority index meaningfully reduces retrieval cost. It helps less on thin marketing sites, single-page landing pages, and sites with under 20 total pages, where there is no meaningful prioritization to perform. The test is whether a human editor would actually make different choices than a sitemap generator.
The format provides maximum value when there is genuine editorial work to do — when your site has so much content that choosing what to prioritize is a real decision. The more content you have, the higher the marginal value of a curated index.
- Developer documentation with API references, guides, tutorials, and examples
- Enterprise knowledge bases with product docs, integration guides, and troubleshooting
- Complex SaaS platforms with multiple products, each with its own doc tree
- Deep content libraries where a curated 20-item index is more useful than a 2,000-URL sitemap
- Sites with technical content that benefits from being cited accurately in AI coding assistants
- Thin marketing sites with under 10 pages of content
- Single-page landing pages where the entire site is already one document
- E-commerce product catalogs where schema.org Product markup already does the entity work
- News sites where freshness matters more than curation
- Sites without meaningful priority differences across their content
How to write a good llms.txt
A good llms.txt uses priority hierarchy to order sections from most to least important, writes concise factual descriptions for each link (one noun phrase explaining what the resource contains), sections content by logical category rather than by site navigation, keeps the full file under 5,000 words for parseability, and separates truly-optional resources into the ## Optional section. The best llms.txt files read like a thoughtful tour guide, not a mirror of the site menu.
The technical format is easy. The editorial discipline is harder. Here are the patterns that separate a useful llms.txt from a useless one.
Prioritize ruthlessly. The order of sections matters. An LLM with limited context will read top to bottom. Put your most citation-worthy content first — typically the canonical reference docs, then guides, then examples, then everything else.
Write descriptions as noun phrases. A description like "Complete reference for the API endpoints, parameters, and response shapes" is far more useful than "API docs" or "Learn more about our API." The LLM uses these descriptions to decide what to fetch — precision pays.
Section by topic, not by site menu. Resist the temptation to mirror your site's nav bar. An llms.txt section called "Documentation" that holds 30 unrelated links is less useful than three sections called "API Reference," "Guides," and "Integrations" that each hold 5-10 related links.
Keep it readable. The entire file should stay under 5,000 words. If you need more, produce a `/llms-full.txt` for the expanded content and keep `/llms.txt` as the lean index. An llms.txt that exceeds 10,000 words tends to break retrieval rather than help it.
Use ## Optional honestly. The `## Optional` section is for genuinely low-priority content — news, blog posts, careers pages, marketing collateral. Do not put important docs there because you think the file is getting too long. Cut less-important sections instead.
Link to canonical URLs only. Do not link to redirects, duplicates, or aliased paths. The LLM will follow the link, and every redirect costs tokens and trust.
Update it when content changes. The file is hand-curated, which means it decays without maintenance. Schedule a quarterly review to add new docs and remove deprecated ones.
Common mistakes to avoid
The common mistakes in llms.txt files are dumping an entire sitemap into the file without curation, omitting or copy-pasting the same description across multiple links, using marketing language instead of factual noun phrases, forgetting to maintain the file as content changes, and including broken or redirected URLs. Any one of these reduces the file from a useful curated index to a noisy copy of a sitemap the LLM could have generated itself.
Most failed llms.txt files share the same mistakes. Avoid these and you will be ahead of the majority of adopters.
The sitemap dump. The single most common failure is auto-generating llms.txt from a sitemap. It defeats the entire purpose of the standard. If your file has 500 links with identical descriptions, it is not an llms.txt — it is a sitemap in markdown clothing.
No descriptions. Links without descriptions force the LLM to fetch each page to understand what it contains. That is the problem llms.txt was designed to solve. Every link needs a short, factual description.
Marketing copy in descriptions. "The industry-leading platform for cutting-edge solutions" is not a description — it is a tagline. LLMs parse it as noise and deprioritize the file.
Broken or redirected URLs. If the LLM follows a link and hits a 404 or a redirect chain, trust in the file drops. Every URL in llms.txt should resolve to a 200 on first request.
No maintenance plan. Documentation changes constantly. An llms.txt from 2024 linking to pages that were deprecated in 2025 actively harms retrieval — the LLM wastes tokens on stale content.
Duplication with sitemap.xml. If your llms.txt is identical to sitemap.xml, delete the llms.txt. The standard exists specifically to be different from — more curated than — the sitemap.
Should you add llms.txt now?
Yes, most sites with meaningful content libraries should add llms.txt now because the effort is low, the format is simple, no platform penalizes sites for having it, and forward compatibility with LLM retrieval is increasingly table stakes for AI-discoverable brands. The business case is asymmetric: small downside if nothing happens, meaningful upside if adoption accelerates. Sites should skip it only if they have fewer than 20 pages of substantive content.
The decision comes down to cost and optionality. Writing a good llms.txt for a mid-sized documentation site takes 2-4 hours of focused editorial work. Hosting it requires nothing beyond placing the file at the root of your domain.
Against that cost, the potential upside is meaningful. If AI platforms move toward first-class llms.txt support over the next 12-18 months — which the trajectory suggests — sites that already have the file will be several months ahead of competitors who wait.
The downside is small. There is no SEO penalty for publishing an llms.txt. No platform currently treats its presence as a negative signal. At worst, the file sits unused and you have a thoughtful content index for your own team to reference.
For any site running a serious answer engine optimization program, llms.txt is a reasonable addition to the infrastructure alongside schema markup, atomic content architecture, and entity authority work.
How llms.txt fits into AEO strategy
llms.txt fits into AEO strategy as one of four complementary machine-readable layers: schema.org provides per-page structured metadata, atomic information architecture structures the content itself for citation, semantic HTML and proper headings make content parseable, and llms.txt provides the site-wide priority index. None of the layers replace the others; mature AEO programs implement all four because they solve different parts of the retrieval problem.
AEO has never been about one tactic. The discipline involves making your content machine-legible at multiple layers, and llms.txt slots cleanly into that stack.
At the content layer, atomic information architecture breaks your knowledge into self-contained, citation-ready blocks of 40-80 words. At the metadata layer, schema.org JSON-LD tells the AI what each block means. At the HTML layer, semantic tags and proper headings make the content parseable without guessing.
Llms.txt adds a fourth layer: the site-wide priority map. Where the other three tell an AI about a specific page, llms.txt tells the AI about the whole site — which pages matter, how they relate, and where to start.
A site running all four layers is substantially more citation-ready than one running any single layer alone. This is why our AEO infrastructure service implements all four as part of the standard build, and why we recommend AEO programs think in terms of machine-legibility as a stack rather than any one checklist item.
The comparison between AEO vs SEO comes into sharper focus once you see the stack — AEO is not a single technique, it is an entire parallel infrastructure.
Practical implementation guide
Here is the shortest viable path from zero to a published llms.txt.
Step 1: Inventory your best content. List the 15-30 pages on your site that you most want cited by AI systems. These are typically the canonical docs, the core guides, and the definitive explainers — not the blog posts or news pages.
Step 2: Group the list into 3-6 sections. Look for natural categories: API Reference, Guides, Integrations, Examples, Changelog. Each section should hold 3-10 links. If a section has only one link, fold it into a neighbor.
Step 3: Write the header. Start the file with an H1 of your site or product name, a blockquote summary of 1-3 sentences, and optionally a paragraph of additional context. Keep the summary factual.
Step 4: Write the link descriptions. For each link, write a single noun phrase explaining what the linked resource contains. Target 8-15 words per description. Prioritize precision over marketing appeal.
Step 5: Add the ## Optional section. Put genuinely low-priority resources here — news, blog, careers, marketing. If nothing qualifies as truly optional, leave the section out rather than padding it.
Step 6: Publish at /llms.txt. The file must live at the root of your domain. For most hosting setups, this means placing a `llms.txt` file in the public root. For Next.js sites, use the `/public/llms.txt` convention. For WordPress, use a root file or a plugin that serves it.
Step 7: Verify accessibility. Fetch `https://yoursite.com/llms.txt` with curl and confirm it returns a 200 with `Content-Type: text/markdown` or `text/plain`. Check that it renders as readable markdown, not as HTML.
Step 8: Optionally produce /llms-full.txt. If your site is large enough that the bundled content version provides value, generate `/llms-full.txt` containing the cleaned markdown text of every page referenced in `/llms.txt`. Tools like Mintlify and Fern auto-generate this; for custom sites, a build-time script is straightforward.
Step 9: Schedule maintenance. Add a quarterly calendar reminder to review the file, add new important pages, and remove deprecated ones. The file is only useful if it stays current.
What llms.txt is not
A final note on scope, because several mischaracterizations have spread in the AEO community.
Llms.txt is not a ranking signal. It does not make your pages rank higher in Google. It does not make AI systems cite you more often in isolation from the underlying content quality.
Llms.txt is not a replacement for schema. Per-page schema.org markup remains essential. Llms.txt works alongside schema, not instead of it.
Llms.txt is not enforced. No AI platform is required to read it. Compliance is voluntary and partial. This is why it belongs in a layered strategy rather than as a single tactic.
Llms.txt is not about SEO. It is about LLM retrieval. These overlap but are not the same discipline. A site can have excellent SEO and poor LLM readiness, and vice versa. Our deeper explainer on what is AEO covers the distinction in full.
The bottom line
Llms.txt is a low-cost, forward-compatible addition to any serious AEO infrastructure. Write it well — genuine curation, factual descriptions, honest prioritization — and it quietly improves how AI retrieval pipelines understand your site. Write it poorly or skip it entirely and the cost is small but the missed optionality grows as the standard matures.
The broader lesson is that machine-legibility is now a stack, not a checkbox. Sites that treat AEO as a layered discipline — content architecture, structured data, semantic HTML, and site-wide indexes like llms.txt — will carry a compounding advantage over sites that treat it as a one-time plugin install.
Next steps
If you are responsible for your company's AI visibility, the fastest way to see where you stand is to run a free AEO scan. The scanner checks for schema markup, atomic answer structure, semantic HTML, entity signals, and — as of 2026 — the presence and quality of llms.txt. You get an engineering-grade report in under two minutes with specific fixes ordered by impact.
For a done-for-you implementation of the full stack, our AEO infrastructure service is a $1,450 one-time build covering schema, atomic architecture, semantic HTML, and llms.txt generation, along with the entity authority work that makes the whole stack defensible. It is the shortest path from zero AI visibility to a citation-ready foundation.
Questions about whether llms.txt makes sense for your specific site or how it fits alongside your existing SEO program? Contact our team — we will give you a direct answer based on your content footprint, not a sales pitch.