I have a site with the following structure:
http://www.example.com/folder1/folder2/folder3
I would like to disallow indexing in folder1
, and folder2
.
But I would like the robots to index everything under folder3
.
Is there a way to do this with the robots.txt?
For what I read I think that everything inside a specified folder is disallowed.
Would开发者_开发百科 the following achieve my goal?
user-agent: *
Crawl-delay: 0
Sitemap: <Sitemap url>
Allow: /folder1/folder2/folder3
Disallow: /folder1/folder2/
Disallow: /folder1/
Allow: /
Yes, it works... however google has a tool to test your robots.txt file
you only need to go on google webmaster tools (https://www.google.com/webmasters/tools/)
and open the section "site configuration -> crawler access"
All you would need is:
user-agent: *
Crawl-delay: 0
Sitemap:
Allow: /folder1/folder2/folder3
Disallow: /folder1/
Allow: /
At least googlebot will see the more specific allowing of that one directory and disallow anything from folder1
and on. This is backed up by this post by a Google employee.
Line breaks in records are not allowed, so your original robots.txt should look like this:
user-agent: *
Crawl-delay: 0
Sitemap: <Sitemap url>
Allow: /folder1/folder2/folder3
Disallow: /folder1/folder2/
Disallow: /folder1/
Allow: /
Possible improvements:
Specifying
Allow: /
is superfluous, as it’s the default anyway.Specifying
Disallow: /folder1/folder2/
is superfluous, asDisallow: /folder1/
is sufficient.As
Sitemap
is not per record, but for all bots, you could specify it as a separate block.
So your robots.txt could look like this:
User-agent: *
Crawl-delay: 0
Allow: /folder1/folder2/folder3
Disallow: /folder1/
Sitemap: http://example.com/sitemap
(Note that the Allow
field is not part of the original robots.txt specification, so don’t expect all bots to understand it.)
精彩评论