THE_COLUMN // AEO

llms.txt Explained: The New Standard for Making Your Content AI-Readable

Written by: iSimplifyMe·Created on: Apr 16, 2026·16 min read

A new file is quietly appearing at the root of developer documentation sites, enterprise knowledge bases, and AI-forward marketing sites: /llms.txt. It is a proposed standard for telling large language models which parts of your site mattercontrolling AI crawler access RAG governance practices most, how to navigate them, and how to cite them accurately.

The proposal came from Jeremy Howard of Answer.AI in September 2024, modeled loosely on robots.txt but designed for a completely different audience — the retrieval pipelines behind ChatGPT, Claude, Perplexity, and Gemini. In the eighteen months since, adoption has grown from a handful of AI-native companies to hundreds of developer-tool firms, documentation platforms, and enterprise SaaS brands.

This guide explains what llms.txt is, how it differs from the other machine-readable standards you already know, whether you should add it now, and how to write one that actually drives citations.

What is llms.txt?

llms.txt is a proposed web standard published in September 2024 by Jeremy Howard of Answer.AI that provides large language models with a curated, markdown-formatted index of a site's most important content. It lives at the root URL /llms.txt and uses a specific format: H1 site name, blockquote summary, and sectioned lists of links with short descriptions. It is designed to help AI systems find and cite authoritative content efficiently.

The llms.txt file is a plain-text markdown document placed at the root of a domain at https://yoursite.com/llms.txt. Its purpose is to give AI retrieval pipelines a hand-curated map of your highest-value content — the pages you actually want cited when someone asks ChatGPT or Perplexity a question about your brand, product, or domain expertise.

Unlike a sitemap, it is written for machines that read prose rather than for crawlers that follow links. Unlike robots.txt, it does not grant or deny access — it recommends priority. And unlike schema.org markup, it sits outside individual pages as a single domain-wide index.

The proposal sits at llmstxt.org, where Howard published the original specification and continues to refine it with community feedback. By April 2026, the format has been adopted by Anthropic, Stripe, Zapier, Cloudflare, and a long list of developer-tool companies, even though no AI platform has officially committed to reading it as a first-class input.

Why a new standard was needed

A new standard was needed because existing web standards do not solve the specific problem of telling LLMs which content to cite. Robots.txt controls crawl permission, sitemap.xml provides a complete URL index for search engines, and schema.org adds per-page metadata. None of them provide a hand-curated, priority-ordered guide to the most citation-worthy content on a site, which is what LLM retrieval pipelines actually need.

LLM retrieval pipelines face a problem search engines never had to solve at the same scale: context windows. A retriever pulling content for a ChatGPT response cannot ingest an entire documentation site — it has a few thousand tokens to spend per query, and every wasted token on a navigation header or cookie banner is a token not spent on a usable answer.

Traditional sitemaps list every URL without ranking, annotation, or cleaned content. Schema.org adds per-page metadata but requires the AI to crawl each page individually. Robots.txt tells crawlers what they can access but says nothing about what matters.

The llms.txt proposal fills this gap with three properties: it is a single file, it is hand-curated for priority, and it is written in markdown that LLMs parse natively. That combination makes it dramatically cheaper for a retrieval pipeline to understand a site than reading a sitemap and fetching every page.

How llms.txt compares to existing standards

llms.txt differs from other machine-readable standards in audience and purpose: robots.txt sets crawler access rules for all bots, sitemap.xml provides a complete URL inventory for search indexing, schema.org adds structured JSON-LD metadata per page, and llms.txt provides a curated priority index specifically for LLM retrieval. The four standards complement rather than replace each other, and a mature AEO program uses all of them.

The four standards solve different problems. Understanding how they fit together is essential before investing in any of them.

Standard	Location	Format	Audience	Purpose
robots.txt	/robots.txt	Plain text directives	All web crawlers	Allow/disallow crawl paths
sitemap.xml	/sitemap.xml	XML	Search engines	Complete URL index
schema.org	Inline JSON-LD	JSON-LD in HTML	Search + AI engines	Per-page structured metadata
llms.txt	/llms.txt	Markdown	LLM retrievers	Curated priority index

Robots.txt is authoritative — if it blocks a path, crawlers should respect it. Sitemap.xml is exhaustive — it lists every URL without opinion. Schema.org is granular — it defines entities, relationships, and properties for individual pages. Llms.txt is editorial — it picks what matters and explains why in plain language.

For a deeper breakdown of how schema for AI citations works alongside these other layers, that guide walks through the JSON-LD patterns that consistently drive citations. Both schema and llms.txt are part of the same broader discipline of making a site machine-legible.

The llms.txt format specification

The llms.txt format specification requires a specific markdown structure: an H1 containing the site or project name, a blockquote summary describing the site's purpose, optional prose context, and H2 sections containing markdown bullet lists of links where each link has a short description after a colon. The file must be valid markdown and parseable by standard markdown libraries without custom extensions. Deviating from this structure reduces LLM parseability.

The structure is deliberately simple. Howard designed it so any LLM that already parses markdown — which is essentially all of them — can read it without custom tokenization.

Here is the canonical structure:

```markdown

# Project Name

A one-to-three sentence summary describing what this site is and who it serves.

Optional paragraphs of additional context. These can explain the scope of the project, important conventions, or anything else that helps an LLM understand the content it is about to index.

Docs

Getting Started: Installation and first steps
API Reference: Complete API documentation
Guides: Step-by-step tutorials

Examples

Example Apps: Production-quality sample applications
Code Snippets: Short, focused code examples

Optional

Changelog: Release history
Blog: Company updates and announcements

```

Three structural rules matter. The H1 must be the site or project name, not a marketing tagline. The blockquote must be a genuine summary, not a pitch. And each link description should be a concise noun phrase explaining what the linked content contains.

The ## Optional section carries a specific meaning in the spec: these are lower-priority resources that an LLM with limited context should deprioritize or skip. Anything not in ## Optional is treated as recommended reading.

Full /llms.txt vs /llms-full.txt

The /llms.txt file is a compact navigation index for LLMs, while /llms-full.txt is an expanded version containing the complete cleaned text content of the documents referenced in /llms.txt. The expanded version lets an LLM ingest the full body of a site's knowledge in a single request without having to crawl individual pages, which is useful for large documentation libraries. Sites with significant content usually publish both.

The proposal includes a companion file at /llms-full.txt that holds the actual content of the pages referenced in /llms.txt. Where /llms.txt is an index, /llms-full.txt is the complete text, stripped of navigation chrome and formatted as clean markdown.

The motivation is practical. A retrieval pipeline fetching /llms.txt gets a map. A pipeline fetching /llms-full.txt gets the entire map plus the contents of every destination, which means it can answer questions without making a second round of requests.

Companies publishing both files treat them as layered: /llms.txt is the catalog and /llms-full.txt is the bundled content. Large documentation sets often produce /llms-full.txt files of several hundred thousand tokens, which is still small enough for modern long-context models to ingest in a single request.

A real example: Anthropic's llms.txt

Here is a simplified representation of the structure that Anthropic and other AI-forward companies have published:

```markdown

# Anthropic

Anthropic is an AI safety company. We build Claude, a family of large language models, and publish research on AI safety, alignment, and interpretability.

Documentation

Getting Started with Claude: Introduction to the Claude API
API Reference: Complete API endpoints and parameters
Prompt Engineering Guide: Best practices for prompting Claude
Tool Use: How Claude uses external tools

Models

Claude Models Overview: Available models and their capabilities
Pricing: Current pricing for Claude models

Research

Research Publications: Peer-reviewed papers and safety work
Responsible Scaling Policy: Our framework for responsible AI development

Optional

Company News: Announcements and updates
Careers: Open positions

```

Three patterns are worth noting. The blockquote summary is factual and succinct — it tells an LLM exactly what to expect. Each section groups related resources. And the ## Optional bucket correctly houses promotional pages that should not displace documentation in a context-constrained retrieval.

How major AI platforms are treating llms.txt

As of April 2026, no major AI platform has officially committed to reading llms.txt as a first-class input, but community-built tools and wrappers increasingly use it, and retrieval pipelines from Anthropic, OpenAI, and Perplexity can be prompted to fetch it. The standard is in the adoption phase typical of early web standards, where sites publish first and platform support follows once adoption crosses a threshold. Sites that publish now are forward-compatible rather than currently-rewarded.

The honest answer is that llms.txt is not yet a universally consumed standard. Anthropic, OpenAI, and Perplexity have not published statements committing to read it automatically. Their current retrieval pipelines rely primarily on web search, schema.org, and direct page fetches.

What is happening in practice is more nuanced. Developer-facing AI tools — Cursor, Continue, Aider, and various RAG frameworks — do read llms.txt when it is present. Anthropic's Claude and OpenAI's ChatGPT can be prompted to fetch /llms.txt for a domain and will do so reliably when asked. Perplexity's retrieval has been observed citing llms.txt-indexed pages at higher rates than non-indexed pages for the same queries, though the company has not formally announced this behavior.

The trajectory looks like most early web standards. Sites adopt first. Platforms add opportunistic support once adoption crosses a threshold. Eventually a tipping point arrives and the standard becomes table stakes.

Adoption as of April 2026

Adoption has moved from the edges to the center of developer tooling. The companies that have published /llms.txt or /llms-full.txt files include:

Anthropic
Stripe
Zapier
Cloudflare
Vercel
Supabase
Resend
Clerk
Prisma
Turso
Fly.io
Mintlify (which auto-generates them for every customer doc site)

A notable pattern: the early adopters are disproportionately developer-tool companies whose customers pipe documentation directly into AI coding assistants. Those companies felt the pain of incomplete context windows first and moved fastest to solve it.

Outside dev tools, adoption is spreading among SaaS companies with complex product documentation, AI-native startups building their own retrieval infrastructure, and a growing minority of marketing-led brands that see llms.txt as a signal of AI-readiness to their customers.

When llms.txt helps (and when it does not)

llms.txt helps sites with complex documentation, deep content libraries, extensive API references, or multi-product knowledge bases where a curated priority index meaningfully reduces retrieval cost. It helps less on thin marketing sites, single-page landing pages, and sites with under 20 total pages, where there is no meaningful prioritization to perform. The test is whether a human editor would actually make different choices than a sitemap generator.

The format provides maximum value when there is genuine editorial work to do — when your site has so much content that choosing what to prioritize is a real decision. The more content you have, the higher the marginal value of a curated index.

Sites where llms.txt helps most:

Developer documentation with API references, guides, tutorials, and examples
Enterprise knowledge bases with product docs, integration guides, and troubleshooting
Complex SaaS platforms with multiple products, each with its own doc tree
Deep content libraries where a curated 20-item index is more useful than a 2,000-URL sitemap
Sites with technical content that benefits from being cited accurately in AI coding assistants

Sites where llms.txt provides minimal lift:

Thin marketing sites with under 10 pages of content
Single-page landing pages where the entire site is already one document
E-commerce product catalogs where schema.org Product markup already does the entity work
News sites where freshness matters more than curation
Sites without meaningful priority differences across their content

The honest test is whether a thoughtful editor would pick a different set of pages than a sitemap generator would. If yes, write an llms.txt. If not, focus your effort on schema markup for AI citations and atomic information architecture instead.

How to write a good llms.txt

A good llms.txt uses priority hierarchy to order sections from most to least important, writes concise factual descriptions for each link (one noun phrase explaining what the resource contains), sections content by logical category rather than by site navigation, keeps the full file under 5,000 words for parseability, and separates truly-optional resources into the ## Optional section. The best llms.txt files read like a thoughtful tour guide, not a mirror of the site menu.

The technical format is easy. The editorial discipline is harder. Here are the patterns that separate a useful llms.txt from a useless one.

Prioritize ruthlessly. The order of sections matters. An LLM with limited context will read top to bottom. Put your most citation-worthy content first — typically the canonical reference docs, then guides, then examples, then everything else.

Write descriptions as noun phrases. A description like "Complete reference for the API endpoints, parameters, and response shapes" is far more useful than "API docs" or "Learn more about our API." The LLM uses these descriptions to decide what to fetch — precision pays.

Section by topic, not by site menu. Resist the temptation to mirror your site's nav bar. An llms.txt section called "Documentation" that holds 30 unrelated links is less useful than three sections called "API Reference," "Guides," and "Integrations" that each hold 5-10 related links.

Keep it readable. The entire file should stay under 5,000 words. If you need more, produce a /llms-full.txt for the expanded content and keep /llms.txt as the lean index. An llms.txt that exceeds 10,000 words tends to break retrieval rather than help it.

Use ## Optional honestly. The ## Optional section is for genuinely low-priority content — news, blog posts, careers pages, marketing collateral. Do not put important docs there because you think the file is getting too long. Cut less-important sections instead.

Link to canonical URLs only. Do not link to redirects, duplicates, or aliased paths. The LLM will follow the link, and every redirect costs tokens and trust.

Update it when content changes. The file is hand-curated, which means it decays without maintenance. Schedule a quarterly review to add new docs and remove deprecated ones.

Common mistakes to avoid

The common mistakes in llms.txt files are dumping an entire sitemap into the file without curation, omitting or copy-pasting the same description across multiple links, using marketing language instead of factual noun phrases, forgetting to maintain the file as content changes, and including broken or redirected URLs. Any one of these reduces the file from a useful curated index to a noisy copy of a sitemap the LLM could have generated itself.

Most failed llms.txt files share the same mistakes. Avoid these and you will be ahead of the majority of adopters.

The sitemap dump. The single most common failure is auto-generating llms.txt from a sitemap. It defeats the entire purpose of the standard. If your file has 500 links with identical descriptions, it is not an llms.txt — it is a sitemap in markdown clothing.

No descriptions. Links without descriptions force the LLM to fetch each page to understand what it contains. That is the problem llms.txt was designed to solve. Every link needs a short, factual description.

Marketing copy in descriptions. "The industry-leading platform for cutting-edge solutions" is not a description — it is a tagline. LLMs parse it as noise and deprioritize the file.

Broken or redirected URLs. If the LLM follows a link and hits a 404 or a redirect chain, trust in the file drops. Every URL in llms.txt should resolve to a 200 on first request.

No maintenance plan. Documentation changes constantly. An llms.txt from 2024 linking to pages that were deprecated in 2025 actively harms retrieval — the LLM wastes tokens on stale content.

Duplication with sitemap.xml. If your llms.txt is identical to sitemap.xml, delete the llms.txt. The standard exists specifically to be different from — more curated than — the sitemap.

Should you add llms.txt now?

Yes, most sites with meaningful content libraries should add llms.txt now because the effort is low, the format is simple, no platform penalizes sites for having it, and forward compatibility with LLM retrieval is increasingly table stakes for AI-discoverable brands. The business case is asymmetric: small downside if nothing happens, meaningful upside if adoption accelerates. Sites should skip it only if they have fewer than 20 pages of substantive content.

The decision comes down to cost and optionality. Writing a good llms.txt for a mid-sized documentation site takes 2-4 hours of focused editorial work. Hosting it requires nothing beyond placing the file at the root of your domain.

Against that cost, the potential upside is meaningful. If AI platforms move toward first-class llms.txt support over the next 12-18 months — which the trajectory suggests — sites that already have the file will be several months ahead of competitors who wait.

The downside is small. There is no SEO penalty for publishing an llms.txt. No platform currently treats its presence as a negative signal. At worst, the file sits unused and you have a thoughtful content index for your own team to reference.

For any site running a serious answer engine optimization program, llms.txt is a reasonable addition to the infrastructure alongside schema markup, atomic content architecture, and entity authority work.

How llms.txt fits into AEO strategy

llms.txt fits into AEO strategy as one of four complementary machine-readable layers: schema.org provides per-page structured metadata, atomic information architecture structures the content itself for citation, semantic HTML and proper headings make content parseable, and llms.txt provides the site-wide priority index. None of the layers replace the others; mature AEO programs implement all four because they solve different parts of the retrieval problem.

AEO has never been about one tactic. The discipline involves making your content machine-legible at multiple layers, and llms.txt slots cleanly into that stack.

At the content layer, atomic information architecture breaks your knowledge into self-contained, citation-ready blocks of 40-80 words. At the metadata layer, schema.org JSON-LD tells the AI what each block means. At the HTML layer, semantic tags and proper headings make the content parseable without guessing.

Llms.txt adds a fourth layer: the site-wide priority map. Where the other three tell an AI about a specific page, llms.txt tells the AI about the whole site — which pages matter, how they relate, and where to start. For a deeper look at how the content and metadata layers need to be engineered for retrieval, see our guide to RAG-ready content architecture.

A site running all four layers is substantially more citation-ready than one running any single layer alone. This is why our AEO infrastructure service implements all four as part of the standard build, and why we recommend AEO programs think in terms of machine-legibility as a stack rather than any one checklist item.

The comparison between AEO vs SEO comes into sharper focus once you see the stack — AEO is not a single technique, it is an entire parallel infrastructure.

Practical implementation guide

Here is the shortest viable path from zero to a published llms.txt.

Step 1: Inventory your best content. List the 15-30 pages on your site that you most want cited by AI systems. These are typically the canonical docs, the core guides, and the definitive explainers — not the blog posts or news pages.

Step 2: Group the list into 3-6 sections. Look for natural categories: API Reference, Guides, Integrations, Examples, Changelog. Each section should hold 3-10 links. If a section has only one link, fold it into a neighbor.

Step 3: Write the header. Start the file with an H1 of your site or product name, a blockquote summary of 1-3 sentences, and optionally a paragraph of additional context. Keep the summary factual.

Step 4: Write the link descriptions. For each link, write a single noun phrase explaining what the linked resource contains. Target 8-15 words per description. Prioritize precision over marketing appeal.

Step 5: Add the ## Optional section. Put genuinely low-priority resources here — news, blog, careers, marketing. If nothing qualifies as truly optional, leave the section out rather than padding it.

Step 6: Publish at /llms.txt. The file must live at the root of your domain. For most hosting setups, this means placing a llms.txt file in the public root. For Next.js sites, use the /public/llms.txt convention. For WordPress, use a root file or a plugin that serves it.

Step 7: Verify accessibility. Fetch https://yoursite.com/llms.txt with curl and confirm it returns a 200 with Content-Type: text/markdown or text/plain. Check that it renders as readable markdown, not as HTML.

Step 8: Optionally produce /llms-full.txt. If your site is large enough that the bundled content version provides value, generate /llms-full.txt containing the cleaned markdown text of every page referenced in /llms.txt. Tools like Mintlify and Fern auto-generate this; for custom sites, a build-time script is straightforward.

Step 9: Schedule maintenance. Add a quarterly calendar reminder to review the file, add new important pages, and remove deprecated ones. The file is only useful if it stays current.

What llms.txt is not

A final note on scope, because several mischaracterizations have spread in the AEO community.

Llms.txt is not a ranking signal. It does not make your pages rank higher in Google. It does not make AI systems cite you more often in isolation from the underlying content quality.

Llms.txt is not a replacement for schema. Per-page schema.org markup remains essential. Llms.txt works alongside schema, not instead of it.

Llms.txt is not enforced. No AI platform is required to read it. Compliance is voluntary and partial. This is why it belongs in a layered strategy rather than as a single tactic.

Llms.txt is not about SEO. It is about LLM retrieval. These overlap but are not the same discipline. A site can have excellent SEO and poor LLM readiness, and vice versa. Our deeper explainer on what is AEO covers the distinction in full.

Frequently Asked Questions

Who created the llms.txt standard, and when?

The proposal came from Jeremy Howard of Answer.AI in September 2024, modeled loosely on robots.txt but designed for LLM retrieval pipelines rather than search-engine crawlers. The specification lives at llmstxt.org and continues to be refined through community feedback.

How is llms.txt different from robots.txt, sitemap.xml, and schema.org?

The four standards solve different problems. Robots.txt sets crawler access rules for all bots. Sitemap.xml provides an exhaustive URL inventory for search indexing. Schema.org adds per-page structured metadata via JSON-LD. Llms.txt provides a hand-curated, priority-ordered index in markdown specifically for LLM retrievers. The four complement rather than replace each other, and a mature AEO program uses all of them.

What is the difference between /llms.txt and /llms-full.txt?

The /llms.txt file is a compact navigation index — typically a markdown list of links with short descriptions. The /llms-full.txt file is the expanded version that contains the complete cleaned text content of the documents referenced in /llms.txt, letting an LLM ingest a site's knowledge in a single request. Sites with significant content usually publish both.

Do major AI platforms read llms.txt today?

As of April 2026, no major AI platform has officially committed to reading llms.txt as a first-class input. Anthropic, OpenAI, and Perplexity have not made formal commitments, though their retrieval pipelines can be prompted to fetch the file and developer-facing tools like Cursor, Continue, and Aider already do read it. The trajectory matches early web standards — sites adopt first, platform support follows once adoption crosses a threshold.

Should every site publish an llms.txt file?

Most sites with meaningful content libraries should publish one because the effort is low (2-4 hours for a mid-sized doc site), no platform penalizes its presence, and forward compatibility with LLM retrieval is increasingly table stakes. The exception is sites with fewer than 20 pages of substantive content, where there is no meaningful prioritization to perform.

The honest test is whether a thoughtful editor would pick a different set of pages than a sitemap generator would — if yes, write one.

What are the most common mistakes in llms.txt files?

The single most common failure is auto-generating llms.txt from a sitemap, which defeats the purpose of the standard. Other frequent mistakes include omitting link descriptions or copy-pasting the same description across multiple links, using marketing language instead of factual noun phrases, including broken or redirected URLs, and failing to maintain the file as content changes. Any one of these reduces the file from a useful curated index to a noisy copy of a sitemap.

The bottom line

Llms.txt is a low-cost, forward-compatible addition to any serious AEO infrastructure. Write it well — genuine curation, factual descriptions, honest prioritization — and it quietly improves how AI retrieval pipelines understand your site. Write it poorly or skip it entirely and the cost is small but the missed optionality grows as the standard matures.

The broader lesson is that machine-legibility is now a stack, not a checkbox. Sites that treat AEO as a layered discipline — content architecture, structured data, semantic HTML, and site-wide indexes like llms.txt — will carry a compounding advantage over sites that treat it as a one-time plugin install.

Next steps

If you are responsible for your company's AI visibility, the fastest way to see where you stand is to run a free AEO scan. The scanner checks for schema markup, atomic answer structure, semantic HTML, entity signals, and — as of 2026 — the presence and quality of llms.txt. You get an engineering-grade report in under two minutes with specific fixes ordered by impact.

For a done-for-you implementation of the full stack, our AEO infrastructure service is a $1,450 one-time build covering schema, atomic architecture, semantic HTML, and llms.txt generation, along with the entity authority work that makes the whole stack defensible. It is the shortest path from zero AI visibility to a citation-ready foundation.

Questions about whether llms.txt makes sense for your specific site or how it fits alongside your existing SEO program? Contact our team — we will give you a direct answer based on your content footprint, not a sales pitch.

Ready to Grow?

Let's build something extraordinary together.

Start a Project

llms.txt Explained: The New Standard for Making Your Content AI-Readable

What is llms.txt?

Why a new standard was needed

How llms.txt compares to existing standards

The llms.txt format specification

Docs

Examples

Optional

Full /llms.txt vs /llms-full.txt

A real example: Anthropic's llms.txt

Documentation

Models

Research

Optional

How major AI platforms are treating llms.txt

Adoption as of April 2026

When llms.txt helps (and when it does not)

How to write a good llms.txt

Common mistakes to avoid

Should you add llms.txt now?

How llms.txt fits into AEO strategy

Practical implementation guide

What llms.txt is not

Frequently Asked Questions

The bottom line

Next steps

Ready to Grow?

Stay Ahead of the Curve