开发者

robots.txt and relative path

开发者 https://www.devze.com 2023-03-30 01:55 出处:网络
I want to disallow any files in any /tmp folder on 开发者_开发知识库my site. e.g. I have: \"/anything/tmp/whatever/test.html\", \"/stuff/tmp/old/test.html\", \"/people/tmp/images.html\", and so on.

I want to disallow any files in any /tmp folder on 开发者_开发知识库my site. e.g. I have: "/anything/tmp/whatever/test.html", "/stuff/tmp/old/test.html", "/people/tmp/images.html", and so on.

Is it enough to put disallow /tmp/ into my robots.txt to block any tmp folder in the whole file system of my webserver? Or do I need to put every single path like: disallow /anything/tmp/ disallow /stuff/tmp/ disallow /tmp/

Or like this: disallow /*/tmp/

Thanks


Straight answer: NO

You'll have to declare each directory you want to exclude from robots.

User-agent: *
Disallow: /anything/tmp/
Disallow: /stuff/tmp/

You can check the syntax of your robots.txt file @ http://www.frobee.com/robots-txt-check
Read more about Robot Exclusion @ http://www.robotstxt.org/orig.html


It actually depends on the REP parser. More advanced parsers do recognize wildcard syntax, but it's not part of the original spec.

That said, Google does honor wildcards. According to their parser:

/fish*.php
Does Match:
    /fish.php 
    /fishheads/catfish.php?parameters
Does Not Match
    /Fish.PHP
0

精彩评论

暂无评论...
验证码 换一张
取 消