Technical

Robots.txt

A plain-text file at the site root that instructs crawlers which URLs they may or may not fetch.

Definition

Slug: robots-txt
Category: Technical
Also known as: robots file, robots.txt protocol

Robots.txt is a plain-text file at the root of a domain (example.com/robots.txt) that tells crawlers which URLs they are allowed to fetch. The protocol is defined in RFC 9309 and supported by all major search engine bots. It is the first request a well-behaved crawler makes when it visits a domain.

The syntax is a series of User-agent and Disallow (or Allow) directives. User-agent: * applies to all bots; specific bots can be addressed individually (Googlebot, Bingbot, etc.). Disallow: /admin/ prevents crawling under /admin/. Allow: /admin/public/ creates an exception. The Sitemap: directive at the top or bottom declares the XML Sitemap location.

Critical mental model: robots.txt controls crawling, not indexing. A URL disallowed in robots.txt can still appear in search results if external sites link to it — the URL is indexed by reference even though Google cannot fetch its content. The snippet in this case is typically "No information is available for this page" or similar. To exclude a URL from search results, allow it in robots.txt and use Robots Meta Tag noindex instead.

This is the single most common robots.txt mistake. Blocking /staging/, /thank-you/, /search/, or /admin/ in robots.txt to "keep them out of Google" leaves them indexed if anyone has linked to them. The right answer is allow + noindex, or password-protection if the content is truly private.

Other common mistakes: blocking CSS or JS resources, which prevents Google from rendering pages correctly and damages mobile-friendly evaluation; using disallow rules that contradict canonical declarations; deploying a development robots.txt (Disallow: /) to production and tanking the entire site overnight; and case-sensitivity errors (paths are case-sensitive; /Admin/ and /admin/ are different rules).

Verify robots.txt with Search Console's robots.txt Tester or by direct inspection. Monitor the Robots.txt section of Search Console for fetch failures — if Googlebot cannot retrieve robots.txt, it will sometimes stop crawling the site entirely until the file is available again.

Keep the file small, well-commented, and version-controlled. Treat changes to it with the same caution as production code deploys.

Apply it

Track domain authority for your sites

Authority Score, backlinks, and 90-day deltas — refreshed daily across every site you monitor.

Start free

Climb

Add your sites. Watch the score.

Daily Authority Score and backlink monitoring for portfolio operators. Free tier — no card.

Start free Back to glossary