Artificial intelligence, SEO

llms.txt

2 minutes

The llms.txt-file is an evolving standard that enables website operators, Large Language Models (LLMs) like ChatGPT or Google Gemini specific instructions for Crawling, to the Indexing and the use of its contents. It serves as a guide for AI systems, to understand and process web content more efficiently and in line with the content owner's preferences.

Functionality and structure

Similar to the established robots.txt-file for Search engine crawler the llms.txt-file in the root directory of a website (e.g. yourdomain.com/llms.txt) are placed. However, their primary aim is not to prevent crawling altogether, but rather to provide context and instructions on how to use the content.

The file is usually formatted as a Markdown document, which makes it easy to read for both humans and AI systems. A typical llms.txt-file:

An H1 title with the name of the project or website.
A blockquote with a short summary of the project that provides the most important information for understanding the content.
Organized sections with relevant Links to more detailed content or documentation.

There are also considerations for variants such as llms-full.txt, which provides the entire content in a single file to make it easier for LLMs to access comprehensive documentation.

Significance for webmasters and the AI landscape

The introduction of llms.txt addresses several challenges in dealing with AI systems and web content. Modern websites are often complex, use JavaScript and dynamic content that is difficult for LLM crawlers to interpret. In addition, there is a wealth of information and it is not always clear to AI systems which content is relevant or important.

The provision of a structured and curated content overview enables llms.txt LLMs to capture the essential information of a website faster and more accurately. This can improve the quality of the answers generated by AI models based on this data.

For website operators llms.txt an opportunity to gain more control over how their content is used by AI systems, particularly with regard to data protection, copyright and the prevention of misinformation. Although compliance is voluntary, this standard is becoming increasingly important as more and more users use AI assistants to search for information.