- Ryan Howard
- Posts
- Everything we know about llms.txt
Everything we know about llms.txt
Llms.txt for AI optimization is a hedge. It might help LLMs understand your content. It might not. There’s limited direct evidence of real indexing behavior today, but if and when these files are adopted, early movers will benefit.

Llms.txt is an experimental standard that offers a structured, LLM-friendly overview of a site’s key content in plain Markdown placed at the root of a domain (e.g., https://docs.anthropic.com/llms.txt) that gives AI models a clean, structured summary of a site's most important content.
It’s being used in real products: Cursor uses it to improve inline completions. Anthropic requested it for Claude’s ingestion pipeline. Mintlify rolled it out across thousands of dev sites. Perplexity maintains their own llms-full.txt
.
But there’s one big question nobody’s reliably answered yet:
Are AI crawlers actually accessing llms.txt?
We’re logging access from known AI crawlers across thousands of participating WordPress sites to gather real data on whether GPTBot, ClaudeBot, or other LLM scrapers are actually touching this file.
Until then, the reality is this:
llms.txt for AI optimization is a hedge. It might help LLMs understand your content. It might not.
There’s limited direct evidence of real indexing behavior today
But if and when these files are adopted, early movers will benefit
The rest of this post rounds up everything we know so far, from usage patterns to platform positions, best practices to valid criticisms. And if you’re running WordPress you can generate and log your own llms.txt
file with a simple plugin install.
Who’s using llms.txt so far?
Adoption has been fast—but fragmented. Mintlify rolled it out across its entire doc hosting platform. Cursor, Zapier, Vercel, and Anthropic use it. Perplexity maintains its own llms-full.txt
. Several GitHub tools have popped up to generate, validate, and simulate llms.txt ingestion.
Some sites quietly link to their llms.txt from the footer or /meta
directory. Others embed references into developer tools or AI prompts. It’s not a widespread standard yet, but it’s circulating in the right places: internal copilots, RAG pipelines, AI IDEs.
Examples worth checking:
What’s it good for?
The strongest use cases aren’t about visibility in AI answers yet. LLMs struggle with noisy HTML, massive DOMs, and irrelevant boilerplate. llms.txt offers a lightweight map of important content in plain Markdown. This helps:
AI coding tools find the right docs for inline suggestions
RAG pipelines reduce token waste
Internal agents skip the crawl and go straight to the signal
For dev-focused companies, llms.txt is already useful. For everyone else, it’s a bet on the future.
Best practices
A few principles matter:
Keep it short. Start with 10–50 key URLs.
Structure clearly. Use Markdown, headers, and link descriptions.
Avoid spam. Don’t treat this like keyword stuffing.
Update it. Your most useful content should always be represented.
Check your logs. See who’s requesting the file.
Some ways llms.txt could be used in the future
Here are a couple of ways llms.txt could evolve beyond today’s basic implementations:
1. A control surface for AI interaction
Instead of just acting like a permissions file, llms.txt could become a pointer to a richer, structured manifest—something like a model-facing API at .well-known/mcp.json
. That file could define what types of AI use cases are permitted (indexing, summarization, training, fine-tuning), how attribution should be handled, where source content is located, and how compliance can be maintained over time.
2. A monetization layer for content access
Inspired by ideas from Cloudflare, llms.txt could eventually link to licensing or billing endpoints that specify payment terms for AI systems reusing content. That might include per-call pricing, usage tiers, or even paywall indicators. Instead of blocking bots outright, publishers could shift toward: "yes, but here’s how.”
Background and technical overview
What is llms.txt?
llms.txt was introduced in 2024 by Jeremy Howard, co-founder of Answer.AI. It's a plain Markdown file placed at the root of a domain (e.g., https://yoursite.com/llms.txt
) that provides LLMs with a clean, structured summary of your site’s most important content.
It’s not a replacement for robots.txt
or sitemap.xml
. Where those focus on crawler permissions and URL discovery, llms.txt is about clarity and intent. It tells LLMs what to read and why it matters.
Structure and formatting
The format is intentionally lightweight:
A single H1 with your site name
A short summary in a blockquote
Grouped H2 sections for content categories
Markdown lists of links with optional descriptions
An optional "## Optional" section for less critical content
Example:
# Acme Docs
> Acme is an API platform for data automation.
## Guides
- [Quick Start](https://acme.com/docs/start.md): Basic setup instructions.
- [API Reference](https://acme.com/docs/api.md): Full API endpoint list.
## Optional
- [Changelog](https://acme.com/docs/changelog.md)
llms.txt vs. llms-full.txt
There are two common variations:
llms.txt
is a curated index of high-priority contentllms-full.txt
is a flattened Markdown export of your entire content corpus
Some companies host both. Others stick with just one, depending on how structured their site is.
Challenges llms.txt is trying to solve
Context window limits: LLMs can’t ingest entire sites. llms.txt gives them a short list.
HTML bloat: Markdown is cleaner and easier to parse.
Content ambiguity: Many sites don’t signal what content matters most. llms.txt helps clarify that.
Frequently Asked Questions
What’s the purpose of llms.txt?
To guide LLMs to your best content using a lightweight, structured format.
Where should I put it?
At the root of your site, like https://example.com/llms.txt
.
Will it help me rank in ChatGPT or Perplexity?
Maybe, but nothing is guaranteed yet. It’s still early. That’s why we’re collecting data.
What’s the difference between llms.txt and sitemap.xml?
Sitemaps list URLs. llms.txt explains them, prioritizes them, and gives structure for AI models.
Do AI bots support it?
Some tools and internal systems use it already. Widespread adoption is still unfolding.
Can I include usage terms or metadata?
Yes. Many sites include attribution rules, update timestamps, or even coupon codes to track conversions.
What kind of sites should use it?
Developer platforms, documentation sites, publishers, product companies—anyone with content worth summarizing or citing.
What’s the difference between llms.txt and llms-full.txt?
The short version curates highlights. The full version dumps everything in clean Markdown. Use what fits your needs.
Can I block or gate specific sections?
Yes. Some implementations use sections or links with access guidelines, or include licensing terms and usage restrictions.
Is there a standard validator?
Not yet. Some GitHub projects can lint basic structure, but there’s no official tool.
Can I automate it?
Yes. Static site generators and platforms like Mintlify often generate it automatically. WordPress users can use the llms.txt plugin.
What format should I use?
Plain Markdown. Use H1/H2 headings, blockquotes, and bullet lists to organize content cleanly.
What’s next?
We’ll publish results from our crawler experiment and keep tracking how llms.txt evolves. To join the experiment or get started with a plugin, visit https://llmstxt.ryanhoward.dev/.