How does Google store the index? [closed]_问答_开发者

How does Google store the index? [closed]

开发者 https://www.devze.com 2023-04-01 00:53 出处：网络

Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this po

相关专题：indexing

Closed. This question needs to be more focused. It is not currently accepting answers.

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 8 years ago.

Improve this question

Lately I have been reading about web crawling, indexing and serving. I have found some informat开发者_如何学Cion on the Google Web Masters Tool - Google Basics about the process that Google does to crawl the Web and serve the searches. What I am wondering is how they save all those indexs? I mean, that's a lot to store right? How do they do it?

Thanks

I'm answering myself because I found some interesting stuff that talks about Google index:

In Google Webmasters YouTube Channel, Matt Cutts give us some references about the architecture behind Google Index: Google Webmaster YouTube Channel
One of those references, and from my point of view a worth reading, is this one: The Anatomy of a Large-Scale Hypertextual Web Search Engine

This helped me to understand it better, and I hope it help you too!

They use a variety of different types of data stores depending on the type of information. Generally, they don't use SQL because it has too much overhead and isn't very compatible with large-scale distribution of information.

Google actually developed their own data store that they use for large read-mostly applications such as Google Earth and the search engine's cache. This supports distributing information over a very large number of computers with each piece of information stored on three or four different computers. This allows them to use cheap hardware -- if one computer fails, the others immediately begin restoring all the data it held to the appropriate number of copies