SHARE
Link copied!

https://www.withdaydream.com/library/what-is-llms-txt

On this page
daydream journal

notes on AI, growth, and the journey from 0→n

What is LLMs.txt?

A Technical Overview of a Proposed Standard for AI Visibility

May 22
 ・ 
Thenuka Karunaratne
 
Thenuka Karunaratne
What is LLMs.txt?

As LLMs (large language models) integrate more real-time browsing and retrieval-augmented generation (RAG), a growing question has emerged: how should websites structure their content to be understood by AI?

In response, some developers have proposed a new standard: LLMs.txt. The standard suggests creating machine-readable entry points for models, akin to what robots.txt is for crawlers, but optimized for inference rather than indexing.

It sounds promising on paper, but in practice, it’s inconsistently implemented, unsupported by major AI providers, and arguably premature. Here we’ll take a closer look at the LLMs.txt standard, the files it proposes, and whether it’s worth implementing.

What the LLMs.txt Standard proposes

The LLMs.txt standard, originally proposed by Jeremy Howard of Answer.AI, consists of two core components:

  1. The llms.txt file: A markdown file placed at /llms.txt on your website. It lists links to LLM-friendly resources—like documentation, policies, and specs—in a lightweight, structured format.
  2. Markdown versions of linked pages: For each URL referenced in llms.txt, the standard recommends serving a markdown version of the page at the same path with a .md extension (e.g. /docs/api /docs/api.md).

Many implementers can also add a third file, not part of the core proposal:

  1. The llms-full.txt file: A concatenation of all the .md versions into a single markdown document, mimicking an expanded context window.

These components aim to help LLMs quickly ingest relevant site content in a clean, parseable format.

Why was this standard created?

Web content is messy. HTML pages contain navbars, ads, dynamic scripts, and irrelevant layout elements, or even disruptive elements, to LLMs trying to extract meaning.

The LLMs.txt standard tries to simplify the input surface by giving LLMs a curated, markdown-based index of high-value content. According to the proposal, this helps solve three core issues:

  • Context window limits: LLMs can't fit most full webpages in a single prompt.
  • Ambiguous content discovery: Crawling doesn't always reflect what’s important.
  • RAG optimization: Developers want finer control over what content is pulled into AI responses.

Proponents suggest this approach is especially useful for:

  • API and SDK documentation
  • Developer tools and onboarding flows
  • SaaS support articles and help centers
  • Research and public knowledge repositories

However, none of these benefits have been confirmed in practice. There’s no public evidence that any major LLM provider uses llms.txt files in retrieval or ranking.

What Does a llms.txt File Contain?

The standard file is hosted at your root domain (e.g. example.com/llms.txt) and uses Markdown syntax.

It follows a loose structure:

# Title
Brief description of the site.

## Section Name
- [Link Title](https://link_url): Optional description
- [Link Title](https://link_url/sub_path): Optional description

## Another Section
- [Link Title](https://link_url): Optional description

A few formatting conventions are encouraged:

  • Each section is grouped by H2 headings (e.g. "Docs", "Policies")
  • Descriptions are optional, but should be concise if included
  • The ## Optional section indicates links that can be skipped in limited context windows

The Markdown Page Requirement

Beyond the file itself, the standard recommends providing .md versions of the pages linked in llms.txt.

Example:

  • HTML: example.com/docs/quickstart
  • Markdown: example.com/docs/quickstart.md

These markdown versions are intended to strip away HTML bloat and give LLMs a clean view of the content. However, this introduces a few challenges:

  • Loss of detail: Markdown may exclude visual elements (e.g. charts, images, tooltips) important for comprehension.
  • Content duplication: Maintaining both HTML and .md versions increases complexity and risks desyncs.
  • Unclear benefit: There’s no indication that .md versions are actively crawled or preferred by any LLM.

For this reason, many developers now question the value of these markdown mirrors, especially in large documentation sites where duplication is expensive to manage.

Comparison to existing standards

Each file serves a different function:

While robots.txt and sitemap.xml are widely adopted and respected by crawlers, llms.txt has no comparable support from any LLM provider.

Adoption Status

Despite the buzz, adoption is limited, and no major AI vendor has endorsed or implemented the standard.

Some devtools and documentation-first startups (like Hugging Face, Cloudflare, and Mintlify) do publish llms.txt files—but it's unclear whether they see any direct benefit, or are simply early adopters testing the waters.

Should you implement it?

In most cases, not yet.

If your team already has structured documentation and can automate the generation of the file, adding llms.txt may be low-cost and harmless. However, it is:

  • Not supported by any major AI provider
  • Not a ranking or retrieval signal
  • Not likely to meaningfully change how LLMs interpret your site

Moreover, maintaining markdown page versions or full concatenated .txt files adds engineering overhead without proven upside.

If you're doing this purely for visibility, you're better off investing in:

  • Schema markup
  • Clean HTML structure
  • Logical internal linking
  • Structured data feeds

These are still the dominant signals for both search engines and LLM crawlers.

Adoption, Industry Reception, and Future Outlook

It’s important to note that LLMs.txt is a speculative workaround, not a robust protocol. It tries to solve a problem that evolving LLM capabilities (like long context windows and multi-hop retrieval) may soon render moot.

There’s no harm in testing it, but don’t mistake adoption for utility.

For now, LLMs.txt is more like meta keywords than robots.txt: an interesting idea, but not one that models are paying attention to.

daydream helps high-growth teams prepare for LLM-native visibility through GEO frameworks, structured content delivery, and AI search optimization. Want your content to rank, resonate, and be referenced by AI? Let’s talk.

References:

The future of search is unfolding; don’t get left behind

Gain actionable insights in real-time as we build and apply the future of AI-driven SEO

What is LLMs-Full.txt?
Insights
May 22

What is LLMs-Full.txt?

A non-standard, developer-led approach to bundling AI-readable content

Thenuka Karunaratne
 
Thenuka Karunaratne
Make AI Engines Trust (and Cite) Your Content
Insights
May 15

Make AI Engines Trust (and Cite) Your Content

A practical guide to optimizing your content for AI discoverability, authority, and attribution.

Thenuka Karunaratne
 
Thenuka Karunaratne
Can’t You Just Do It All?
Insights
May 15

Can’t You Just Do It All?

The rise of AI-enabled services in the outcome era.

Vedant Suri
 
Vedant Suri
Thenuka Karunaratne
 
Thenuka Karunaratne

Build an organic growth engine that drives results

THE FASTEST-GROWING STARTUPS TRUST DAYDREAM