Google’s John Mueller recently provided some advice on how to block robots.txt and sitemap files from being indexed in search results.
This advice was prompted by a tweet from Google’s Gary Illyes, who randomly pointed out that robots.txt can technically be indexed like any other URL. While it provides special directions for crawling, there’s nothing to stop it from being indexed.
Here’s the full tweet from Illyes:
In response to his fellow Googler, Mueller said the x-robots-tag HTTP header can be used to block indexing of robots.txt or sitemap files. That wasn’t all he had to say on the matter, however, as this was arguably the key takeaway:
“Triggered by an internal question: robots.txt from indexing point of view is just a url whose content can be indexed. It can become canonical or it can be deduped, just like any other URL.
It only has special meaning for crawling, but there its index status doesn’t matter at all.”
So if you’re running into the problem where your robots.txt file is ranking in search results, blocking it using the x-robots-tag HTTP header is a good short-term solution. But if that’s happening then there are likely much larger issues to take care of in the long-term, as Mueller suggests.
“Also, if your robots.txt or sitemap file is ranking for normal queries (not site, that’s usually a sign that your site is really bad off and should be improved instead.”