Ryan Howard
Posts
Everything we know about llms.txt

Everything we know about llms.txt

Llms.txt for AI optimization is a hedge. It might help LLMs understand your content. It might not. There’s limited direct evidence of real indexing behavior today, but if and when these files are adopted, early movers will benefit.

Ryan Howard
July 02, 2025

Llms.txt is an experimental standard that offers a structured, LLM-friendly overview of a site’s key content in plain Markdown placed at the root of a domain (e.g., https://docs.anthropic.com/llms.txt) that gives AI models a clean, structured summary of a site's most important content.

It’s being used in real products: Cursor uses it to improve inline completions. Anthropic requested it for Claude’s ingestion pipeline. Mintlify rolled it out across thousands of dev sites. Perplexity maintains their own llms-full.txt.

But there’s one big question nobody’s reliably answered yet:

Are AI crawlers actually accessing llms.txt?

We’re logging access from known AI crawlers across thousands of participating WordPress sites to gather real data on whether GPTBot, ClaudeBot, or other LLM scrapers are actually touching this file.

Until then, the reality is this:

llms.txt for AI optimization is a hedge. It might help LLMs understand your content. It might not.
There’s limited direct evidence of real indexing behavior today
But if and when these files are adopted, early movers will benefit

The rest of this post rounds up everything we know so far, from usage patterns to platform positions, best practices to valid criticisms. And if you’re running WordPress you can generate and log your own llms.txt file with a simple plugin install.

Who’s using llms.txt so far?

Adoption has been fast—but fragmented. Mintlify rolled it out across its entire doc hosting platform. Cursor, Zapier, Vercel, and Anthropic use it. Perplexity maintains its own llms-full.txt. Several GitHub tools have popped up to generate, validate, and simulate llms.txt ingestion.

Some sites quietly link to their llms.txt from the footer or /meta directory. Others embed references into developer tools or AI prompts. It’s not a widespread standard yet, but it’s circulating in the right places: internal copilots, RAG pipelines, AI IDEs.

Examples worth checking:

What’s it good for?

The strongest use cases aren’t about visibility in AI answers yet. LLMs struggle with noisy HTML, massive DOMs, and irrelevant boilerplate. llms.txt offers a lightweight map of important content in plain Markdown. This helps:

AI coding tools find the right docs for inline suggestions
RAG pipelines reduce token waste
Internal agents skip the crawl and go straight to the signal

For dev-focused companies, llms.txt is already useful. For everyone else, it’s a bet on the future.

Best practices

A few principles matter:

Keep it short. Start with 10–50 key URLs.
Structure clearly. Use Markdown, headers, and link descriptions.
Avoid spam. Don’t treat this like keyword stuffing.
Update it. Your most useful content should always be represented.
Check your logs. See who’s requesting the file.

Some ways llms.txt could be used in the future

Here are a couple of ways llms.txt could evolve beyond today’s basic implementations:

1. A control surface for AI interaction

Instead of just acting like a permissions file, llms.txt could become a pointer to a richer, structured manifest—something like a model-facing API at .well-known/mcp.json. That file could define what types of AI use cases are permitted (indexing, summarization, training, fine-tuning), how attribution should be handled, where source content is located, and how compliance can be maintained over time.

2. A monetization layer for content access

Inspired by ideas from Cloudflare, llms.txt could eventually link to licensing or billing endpoints that specify payment terms for AI systems reusing content. That might include per-call pricing, usage tiers, or even paywall indicators. Instead of blocking bots outright, publishers could shift toward: "yes, but here’s how.”

Background and technical overview

What is llms.txt?

llms.txt was introduced in 2024 by Jeremy Howard, co-founder of Answer.AI. It's a plain Markdown file placed at the root of a domain (e.g., https://yoursite.com/llms.txt) that provides LLMs with a clean, structured summary of your site’s most important content.

It’s not a replacement for robots.txt or sitemap.xml. Where those focus on crawler permissions and URL discovery, llms.txt is about clarity and intent. It tells LLMs what to read and why it matters.

Structure and formatting

The format is intentionally lightweight:

A single H1 with your site name
A short summary in a blockquote
Grouped H2 sections for content categories
Markdown lists of links with optional descriptions
An optional "## Optional" section for less critical content

Example:

# Acme Docs

> Acme is an API platform for data automation.

## Guides
- [Quick Start](https://acme.com/docs/start.md): Basic setup instructions.
- [API Reference](https://acme.com/docs/api.md): Full API endpoint list.

## Optional
- [Changelog](https://acme.com/docs/changelog.md)

llms.txt vs. llms-full.txt

There are two common variations:

llms.txt is a curated index of high-priority content
llms-full.txt is a flattened Markdown export of your entire content corpus

Some companies host both. Others stick with just one, depending on how structured their site is.

Challenges llms.txt is trying to solve

Context window limits: LLMs can’t ingest entire sites. llms.txt gives them a short list.
HTML bloat: Markdown is cleaner and easier to parse.
Content ambiguity: Many sites don’t signal what content matters most. llms.txt helps clarify that.

Frequently Asked Questions

What’s the purpose of llms.txt?

To guide LLMs to your best content using a lightweight, structured format.

Where should I put it?

At the root of your site, like https://example.com/llms.txt.

Will it help me rank in ChatGPT or Perplexity?

Maybe, but nothing is guaranteed yet. It’s still early. That’s why we’re collecting data.

What’s the difference between llms.txt and sitemap.xml?

Sitemaps list URLs. llms.txt explains them, prioritizes them, and gives structure for AI models.

Do AI bots support it?

Some tools and internal systems use it already. Widespread adoption is still unfolding.

Can I include usage terms or metadata?

Yes. Many sites include attribution rules, update timestamps, or even coupon codes to track conversions.

What kind of sites should use it?

Developer platforms, documentation sites, publishers, product companies—anyone with content worth summarizing or citing.

What’s the difference between llms.txt and llms-full.txt?

The short version curates highlights. The full version dumps everything in clean Markdown. Use what fits your needs.

Can I block or gate specific sections?

Yes. Some implementations use sections or links with access guidelines, or include licensing terms and usage restrictions.

Is there a standard validator?

Not yet. Some GitHub projects can lint basic structure, but there’s no official tool.

Can I automate it?

Yes. Static site generators and platforms like Mintlify often generate it automatically. WordPress users can use the llms.txt plugin.

What format should I use?

Plain Markdown. Use H1/H2 headings, blockquotes, and bullet lists to organize content cleanly.

What’s next?

We’ll publish results from our crawler experiment and keep tracking how llms.txt evolves. To join the experiment or get started with a plugin, visit https://llmstxt.ryanhoward.dev/.