On this page

daydream journal

notes on AI, growth, and the journey from 0→n

All Resources

Insights

What is LLMs.txt?

A Technical Overview of a Proposed Standard for AI Visibility

May 22

・

daydream team

As LLMs (large language models) integrate more real-time browsing and retrieval-augmented generation (RAG), a growing question has emerged: how should websites structure their content to be understood by AI?

In response, some developers have proposed a new standard: LLMs.txt. The standard suggests creating machine-readable entry points for models, akin to what robots.txt is for crawlers, but optimized for inference rather than indexing.

It sounds promising on paper, but in practice, it’s inconsistently implemented, unsupported by major AI providers, and arguably premature. Here we’ll take a closer look at the LLMs.txt standard, the files it proposes, and whether it’s worth implementing.

What the LLMs.txt Standard proposes

The LLMs.txt standard, originally proposed by Jeremy Howard of Answer.AI, consists of two core components:

The llms.txt file: A markdown file placed at /llms.txt on your website. It lists links to LLM-friendly resources—like documentation, policies, and specs—in a lightweight, structured format.
Markdown versions of linked pages: For each URL referenced in llms.txt, the standard recommends serving a markdown version of the page at the same path with a .md extension (e.g. /docs/api → /docs/api.md).

Many implementers can also add a third file, not part of the core proposal:

The llms-full.txt file: A concatenation of all the .md versions into a single markdown document, mimicking an expanded context window.

These components aim to help LLMs quickly ingest relevant site content in a clean, parseable format.

Why was this standard created?

Web content is messy. HTML pages contain navbars, ads, dynamic scripts, and irrelevant layout elements, or even disruptive elements, to LLMs trying to extract meaning.

The LLMs.txt standard tries to simplify the input surface by giving LLMs a curated, markdown-based index of high-value content. According to the proposal, this helps solve three core issues:

Context window limits: LLMs can't fit most full webpages in a single prompt.
Ambiguous content discovery: Crawling doesn't always reflect what’s important.
RAG optimization: Developers want finer control over what content is pulled into AI responses.

Proponents suggest this approach is especially useful for:

API and SDK documentation
Developer tools and onboarding flows
SaaS support articles and help centers
Research and public knowledge repositories

However, none of these benefits have been confirmed in practice. There’s no public evidence that any major LLM provider uses llms.txt files in retrieval or ranking.

What Does a llms.txt File Contain?

The standard file is hosted at your root domain (e.g. example.com/llms.txt) and uses Markdown syntax.

It follows a loose structure:

# Title Brief description of the site. ## Section Name - [Link Title](https://link_url): Optional description - [Link Title](https://link_url/sub_path): Optional description ## Another Section - [Link Title](https://link_url): Optional description

A few formatting conventions are encouraged:

Each section is grouped by H2 headings (e.g. "Docs", "Policies")
Descriptions are optional, but should be concise if included
The ## Optional section indicates links that can be skipped in limited context windows

The Markdown Page Requirement

Beyond the file itself, the standard recommends providing .md versions of the pages linked in llms.txt.

Example:

HTML: example.com/docs/quickstart
Markdown: example.com/docs/quickstart.md

These markdown versions are intended to strip away HTML bloat and give LLMs a clean view of the content. However, this introduces a few challenges:

Loss of detail: Markdown may exclude visual elements (e.g. charts, images, tooltips) important for comprehension.
Content duplication: Maintaining both HTML and .md versions increases complexity and risks desyncs.
Unclear benefit: There’s no indication that .md versions are actively crawled or preferred by any LLM.

For this reason, many developers now question the value of these markdown mirrors, especially in large documentation sites where duplication is expensive to manage.

Comparison to existing standards

Each file serves a different function:

Feature	LLMs.txt	robots.txt	sitemap.xml
Purpose	Guide LLMs during inference	Control crawl access	Help discover site URLs
Format	Markdown	Plain text	XML
Use Case	Inference-time retrieval	Index-time access control	URL discovery
Audience	LLMs + Developers	Crawlers	Crawlers

‍

While robots.txt and sitemap.xml are widely adopted and respected by crawlers, llms.txt has no comparable support from any LLM provider.

Adoption Status

Despite the buzz, adoption is limited, and no major AI vendor has endorsed or implemented the standard.

LLM Provider	Uses llms.txt?
OpenAI (GPTBot)	❌ No public support
Anthropic (Claude)	❌ Only publishes llms.txt on the docs subdomain; no confirmed usage
Google (Gemini)	❌ No mention or crawler behavior
Meta (LLaMA)	❌ No indication of usage

‍

Some devtools and documentation-first startups (like Hugging Face, Cloudflare, and Mintlify) do publish llms.txt files—but it's unclear whether they see any direct benefit, or are simply early adopters testing the waters.

Should you implement it?

In most cases, not yet.

If your team already has structured documentation and can automate the generation of the file, adding llms.txt may be low-cost and harmless. However, it is:

Not supported by any major AI provider
Not a ranking or retrieval signal
Not likely to meaningfully change how LLMs interpret your site

Moreover, maintaining markdown page versions or full concatenated .txt files adds engineering overhead without proven upside.

If you're doing this purely for visibility, you're better off investing in:

Schema markup
Clean HTML structure
Logical internal linking
Structured data feeds

These are still the dominant signals for both search engines and LLM crawlers.

Adoption, Industry Reception, and Future Outlook

It’s important to note that LLMs.txt is a speculative workaround, not a robust protocol. It tries to solve a problem that evolving LLM capabilities (like long context windows and multi-hop retrieval) may soon render moot.

There’s no harm in testing it, but don’t mistake adoption for utility.

For now, LLMs.txt is more like meta keywords than robots.txt: an interesting idea, but not one that models are paying attention to.

daydream helps high-growth teams prepare for LLM-native visibility through GEO frameworks, structured content delivery, and AI search optimization. Want your content to rank, resonate, and be referenced by AI? Let’s talk.

References:

The future of search is unfolding; don’t get left behind

Gain actionable insights in real-time as we build and apply the future of AI-driven SEO

Insights

Jun 19

Measure Traffic from LLM Platforms

A practical guide to tracking traffic from ChatGPT, Gemini and other LLMs in GA4 so you can measure AI-driven visibility and optimize your content strategy.

daydream team

Insights

Jun 12

Protect Your Brand in the Age of AI Search

A strategic guide on protecting your brand in the AI search era, showing why human oversight and clear brand identity matter as AI-generated results shape user perceptions.

daydream team

Insights

Jun 12

Measure Your AI Search Visibility Score

A new framework for measuring your AI search visibility score—helping brands quantify how often and how well they show up in AI-generated search results.

daydream team

Explore the daydream library

Build an organic growth engine that ‍drives results

Book a demo

THE FASTEST-GROWING STARTUPS TRUST DAYDREAM