robots.txt
Java robots.txt parser with wildcard support
I\'m looking for a robots.txt parser in Java, which supports the same pattern matching rules as the Googlebot.[详细]
2023-04-01 12:51 分类:问答robots.txt disallow: spider
I\'m looking at a robots.txt file of a site I would like to do a one off scrape and there is this line:[详细]
2023-03-31 01:48 分类:问答Is noindex valid in robots.txt? [duplicate]
This question already has answers here: Noindex in a robots.txt 开发者_如何学JAVA (2 answers) Closed 1 year ago.[详细]
2023-03-30 16:07 分类:问答robots.txt and relative path
I want to disallow any files in any /tmp folder on 开发者_开发知识库my site. e.g. I have: \"/anything/tmp/whatever/test.html\", \"/stuff/tmp/old/test.html\", \"/people/tmp/images.html\", and so on.[详细]
2023-03-30 01:55 分类:问答Stop abusive bots from crawling?
Is this a good idea?? http://browsers.garykeith.com/strea开发者_运维知识库m.asp?RobotsTXT What does abusive crawling mean? How is that bad for my site?Not really. Most \"bad bots\" ignore the robo[详细]
2023-03-28 07:06 分类:问答block google robots for URLS containing a certain word
my client has a load of pages whic开发者_StackOverflow中文版h they dont want indexed by google - they are all called[详细]
2023-03-23 18:32 分类:问答Google indexes my sitemap as webpage
I have the following problem. My sitemap\'s content is shown in GOOGLE search results. There is a link to the sitemap on the mai开发者_开发问答n page. That can cause it. I have added this URL to GOOGL[详细]
2023-03-23 07:40 分类:问答How to disallow bots from a single page or file
How to disallow bots from a single page and allow allow all other content to be crawled. Its so important not to get wrong so I am asking here, cant find a definitive answer elsewhere.[详细]
2023-03-21 18:13 分类:问答Excluding testing subdomain from being crawled by search engines (w/ SVN Repository)
I have: domain.com testing.domain.com I want domain.com to be crawled and indexed by searc开发者_开发问答h engines, but not testing.domain.com[详细]
2023-03-21 08:39 分类:问答Using Robots.txt For Finding username and Password [closed]
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical andcannot be reasonably answered in its current form. For help clari[详细]
2023-03-20 23:28 分类:问答