开发者

What do I do if I don't want my website to be indexed by search engines?

开发者 https://www.devze.com 2023-01-11 01:30 出处:网络
What\'s the tag that you have to put in HTML to prevent your pages from being indexed b开发者_如何学Cy search engines?Add this to the HTML <head> element of the pages you\'d like not to index:

What's the tag that you have to put in HTML to prevent your pages from being indexed b开发者_如何学Cy search engines?


Add this to the HTML <head> element of the pages you'd like not to index:

<meta name="robots" content="noindex, nofollow">

To cover the entire site, create a robots.txt on the root folder which contains the following lines:

User-agent: *
Disallow: /

See also:

  • Google webmaster tools help
  • The robots exclusion standard


Use a robots.txt file to restrict indexing: http://www.robotstxt.org/orig.html


The other answers here are subtly wrong. Unfortunately the answer is a good deal more complicated.

Some search engines support the HTML noindex tag, but not all of them do. In particular, Bing and Google do, but a bunch of others don't (here's my research on this). Depending on whether a search engine supports noindex, you have to take a different approach.

For those that support noindex (Google, Bing)..

For these you need to include the noindex tag in your HTML like this:

<meta name="robots" content="noindex, noodp, noarchive, noimageindex" />

Note that there are other "no-" things in there as well. I'll leave looking those up as an exercise to the reader.

In addition to this, you must not block Google and Bing in your robots.txt file, or else they'll never see your noindex meta tag and it will be useless. This is important because Google and Bing consider noindex to mean "do not show this result at all, ever" while a link blocked by robots.txt means "if somebody links here, you can show it, but don't ever crawl it." There's the rub: If Google or Bing knows about a page that's blocked by robots, they'll show it in their results without knowing its content and without ever crawling it. That is why you must not block Google and Bing with robots, and must instead block them with noindex.

For those that do not support noindex (Internet Archive, Alexa, Blekko, Baidu)...

These, you must simply block in your robots.txt file. You can include the noindex tag as well, but it will have no effect since the page will never get crawled.

Bonus section

  1. If you want bonus points, you should set up sitemap.xml files for Google and Bing so they can discover your content as quickly as possible (and then block it!).
  2. If you have binary content (like pictures, pdfs, etc), you'll need to block those using the x-robots HTTP header. See my blog post for more details!

Why this is my personal project to write long answers like this...

I run a site with about 7M legal documents. Some have personal info in them and cannot be in search engines. I've studied this more than any person ever should and it's frustrating that the robots.txt myth is so strong.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号