Llm txt general guide | References samperalabs

What is llms.txt?

llms.txt is a convention for making any body of knowledge navigable by LLMs. Instead of forcing a model to search exhaustively, llms.txt files provide a structured map — what exists, how it's organized, and where to find each part.

The original llmstxt.org proposal defines a single flat file at a website's root. This system extends that into a hierarchical tree of navigation files that scales to large, complex knowledge structures — from codebases to legal systems to medical guidelines.

The core idea: an LLM reads one file and knows where to go next, without guessing or searching.

The Problem It Solves

LLMs facing any large knowledge base don't know what exists, how it's organized, or what's relevant to a given query. Without guidance, they either hallucinate, require expensive retrieval over everything, or miss critical context.

llms.txt eliminates this by providing a pre-built navigation layer that works across any domain.

Core Principles

Read one file, know where to go. Every llms.txt gives enough context to decide the next navigation step without reading downstream content.
Route, don't duplicate. Each file summarizes what's behind each link but never reproduces the content. One source of truth.
Every link gets a one-line description. A bare link is useless. The description tells the LLM what it will find before following the link.
Depth matches scope. The root file covers the broadest scope at the highest abstraction. Each level down narrows scope and increases specificity.
Navigation nodes vs. content nodes. llms.txt files are navigation nodes (they link). Leaf files are content nodes (they hold substantive information). Never mix the two roles.
Authority and provenance. When mapping knowledge with authoritative sources (laws, standards, guidelines), navigation nodes should indicate the source, version, or effective date.

The Hierarchy

The system is a tree. Each level narrows scope and increases detail.

Root llms.txt           → "What is this knowledge base?"
  └── Hub llms.txt      → "What's in this division?"
        ├── Leaf docs   → Detailed reference content
        └── Index llms.txt → "What items make up this topic?"
              └── Leaf content

Level 1 — Root

Scope: The entire knowledge collection. Purpose: Orient the reader. "What is this? What are its major divisions?" Links to: Level 2 hubs. Does not contain: Detailed content, specific entries, or deep references.

Level 2 — Hub

Scope: One major division of the knowledge. Purpose: Comprehensive index for a topic area. "What subtopics exist? What reference material is available?" Links to: Leaf content files and Level 3 indexes for complex subtopics.

Level 3 — Index

Scope: A single topic or narrowly defined subject. Purpose: Map every relevant content item. "What specific documents make up this topic?" Links to: Leaf content files (the actual information).

Three levels suffice for most domains. The system is recursive — any navigation node can link to another, allowing arbitrary depth for exceptionally large knowledge structures. Rule of thumb: if a node has 30+ links, split it into a deeper level.

File Format

Every llms.txt follows the same format:

# Title

> One-sentence description of what this file covers.

Optional prose paragraph(s) for context. For versioned knowledge,
include a line like "Based on the 2024 California Penal Code" or
"Current as of: January 2026."

## Section Name

- [Link Text](relative/path/to/target): What the reader will find at this path
- [Another Link](another/path): What this target contains

## See Also

- [Related Topic](path/to/other/hub): Why this cross-reference matters

Format Rules

Title (# H1): Name of the scope this file covers.
Abstract (> blockquote): One sentence. A reader seeing only this line should understand the scope.
Prose (optional): 1-2 paragraphs max. Domain context, source authority, version/date info.
Sections (## H2): Group links by category.
Links (- [text](path): description): Every link has a colon-separated description saying what is inside.
See Also (optional ## H2): Cross-references to related nodes in other branches of the tree.

Where This Applies

The hierarchy maps naturally to any domain with structured knowledge:

Domain	Root (L1)	Hub (L2)	Index (L3)	Leaf Content
Legal	"California Law"	"Penal Code"	"Crimes Against Persons"	Individual statutes
Documentation	"Product X Docs"	"API Reference"	"Authentication APIs"	Endpoint docs
Medical	"Clinical Guidelines"	"Cardiology"	"Heart Failure"	Treatment protocols
Academic	"CS Curriculum"	"Data Structures"	"Graph Algorithms"	Lecture notes
Standards	"ISO 27000 Series"	"ISO 27001:2022"	"Annex A Controls"	Control descriptions
Codebases	Repository root	Frontend / Backend	Feature module	Source files
Corporate	"Company Wiki"	"Engineering"	"Incident Response"	Runbooks
Regulatory	"EU Regulations"	"GDPR"	"Data Subject Rights"	Article breakdowns

Worked Example: Legal Domain

Root (law/llms.txt):

# California Law

> Navigation map for the California legal code system.

Based on the California Legislative Information database, current as of January 2026.

## Codes

- [Penal Code](penal/llms.txt): Criminal law — crimes, punishments, criminal procedure
- [Civil Code](civil/llms.txt): Property, contracts, obligations, personal rights
- [Vehicle Code](vehicle/llms.txt): Traffic law, vehicle registration, licensing

Hub (law/penal/llms.txt):

# California Penal Code

> Criminal law statutes covering crimes, punishments, and procedure.

## Parts

- [Part 1: Crimes and Punishments](part1/llms.txt): Definitions of crimes and their penalties
- [Part 2: Criminal Procedure](part2/llms.txt): Arrest, trial, sentencing procedures

## Reference

- [Sentencing Tables](reference/sentencing.md): Quick-reference grid of offenses and penalty ranges

Index (law/penal/part1/llms.txt):

# Part 1: Crimes and Punishments

> Substantive criminal law — offense definitions and penalties.

## Title 8: Crimes Against the Person

- [Chapter 1: Homicide](title8/homicide.md): Sections 187-199 — murder, manslaughter, justifiable homicide
- [Chapter 2: Mayhem](title8/mayhem.md): Sections 203-206.1 — aggravated mayhem, torture

## See Also

- [Civil Code: Personal Rights](../../civil/personal-rights/llms.txt): Civil remedies for the same harms

Considerations for Non-Code Domains

Versioning. Laws change, standards get revised, guidelines update. Include effective dates in the prose block and in link descriptions where relevant: - [Section 187](title8/homicide.md): Murder definition (amended 2023).

Authority. Unlike code (where the file is the source), knowledge domains reference external authorities. State the authoritative source in the prose block. Note in descriptions whether content is original text or a summary.

Cross-references. Knowledge is rarely a clean tree — a drug relates to multiple conditions, a law is cited by multiple regulations. Use "See Also" sections to link laterally across branches. These should point to other llms.txt nodes, not directly to leaf content in another branch.

Multi-format content. Some domains involve PDFs, spreadsheets, or databases. Links should point to LLM-consumable versions (Markdown) wherever possible. Note the original format in the description if relevant.

Planned content. It's valid to build the navigation tree before all leaf content exists. Mark incomplete links with (planned) in the description.

Integration with LLM Agents

llms.txt files are passive — nothing forces an LLM to read them. To make them effective, the LLM's instructions or system must enforce a navigation-first workflow:

Coding agents (Claude Code, Cursor, etc.): The agent instruction file (CLAUDE.md, .cursorrules) tells the agent to read llms.txt before any task.
RAG systems: Use llms.txt as a structured index to guide chunk selection, rather than relying on pure embedding similarity.
Chatbots / assistants: The system prompt directs the model to start at the root llms.txt when answering domain questions.

The read path is always the same: root → hub → index → leaf content → answer.

When to Create Each Level

Situation	Action
New knowledge collection	Create a root `llms.txt`
Major topic area with 5+ subtopics	Create a Hub `llms.txt`
Narrower topic with 5+ content items	Create an Index `llms.txt`
Simple topic with 1-4 items	Inline as links in the parent Hub
Reference material (glossary, tables)	Create a leaf `.md`, link from the relevant Hub
Content that cross-cuts divisions	Add "See Also" links in relevant nodes

Keeping It in Sync

Staleness is the biggest risk. A stale link actively misleads.

After any change that adds, removes, or reorganizes content — update the relevant llms.txt.
After adding a new area — decide: own Hub, or inline in a parent?
Periodically audit links. Dead links waste model turns on errors.
For versioned knowledge — update effective dates when sources change.
Encode the update rule in whatever manages the content (agent instructions, CI, editorial workflow).

Common Mistakes

Too much content in llms.txt. Route, don't teach. Move detailed content into leaf files.
Links without descriptions. Always explain what the LLM will find.
Skipping levels. Don't link from root directly to leaf content.
Forgetting to update. Stale links are the most common failure mode.
Mirroring storage structure instead of navigation logic. The hierarchy should reflect how someone would look for the knowledge, not how files happen to be organized.
Ignoring temporal context. For knowledge that changes over time, missing dates or versions make the map unreliable.
Over-linking. If a node has 30+ links, it's become a flat list. Split it.

The llms.txt System: Hierarchical Knowledge Navigation for LLMs