开发者

Blocking folders inbetween allowed content

开发者 https://www.devze.com 2023-03-06 01:10 出处:网络
I have a site with the following structure: http://www.example.com/folder1/folder2/folder3 I would like to disallow indexing in folder1, and folder2.

I have a site with the following structure:

http://www.example.com/folder1/folder2/folder3

I would like to disallow indexing in folder1, and folder2. But I would like the robots to index everything under folder3.

Is there a way to do this with the robots.txt?

For what I read I think that everything inside a specified folder is disallowed.

Would开发者_开发百科 the following achieve my goal?

user-agent: *
Crawl-delay: 0

Sitemap: <Sitemap url>

Allow: /folder1/folder2/folder3
Disallow: /folder1/folder2/
Disallow: /folder1/
Allow: /


Yes, it works... however google has a tool to test your robots.txt file

you only need to go on google webmaster tools (https://www.google.com/webmasters/tools/)

and open the section "site configuration -> crawler access"


All you would need is:

user-agent: *
Crawl-delay: 0

Sitemap: 

Allow: /folder1/folder2/folder3
Disallow: /folder1/
Allow: /

At least googlebot will see the more specific allowing of that one directory and disallow anything from folder1 and on. This is backed up by this post by a Google employee.


Line breaks in records are not allowed, so your original robots.txt should look like this:

user-agent: *
Crawl-delay: 0
Sitemap: <Sitemap url>
Allow: /folder1/folder2/folder3
Disallow: /folder1/folder2/
Disallow: /folder1/
Allow: /

Possible improvements:

  • Specifying Allow: / is superfluous, as it’s the default anyway.

  • Specifying Disallow: /folder1/folder2/ is superfluous, as Disallow: /folder1/ is sufficient.

  • As Sitemap is not per record, but for all bots, you could specify it as a separate block.

So your robots.txt could look like this:

User-agent: *
Crawl-delay: 0
Allow: /folder1/folder2/folder3
Disallow: /folder1/

Sitemap: http://example.com/sitemap

(Note that the Allow field is not part of the original robots.txt specification, so don’t expect all bots to understand it.)

0

精彩评论

暂无评论...
验证码 换一张
取 消