Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this questionLately I have been reading about web crawling, indexing and serving. I have found some informat开发者_如何学Cion on the Google Web Masters Tool - Google Basics about the process that Google does to crawl the Web and serve the searches. What I am wondering is how they save all those indexs? I mean, that's a lot to store right? How do they do it?
Thanks
I'm answering myself because I found some interesting stuff that talks about Google index:
- In Google Webmasters YouTube Channel, Matt Cutts give us some references about the architecture behind Google Index: Google Webmaster YouTube Channel
- One of those references, and from my point of view a worth reading, is this one: The Anatomy of a Large-Scale Hypertextual Web Search Engine
This helped me to understand it better, and I hope it help you too!
They use a variety of different types of data stores depending on the type of information. Generally, they don't use SQL because it has too much overhead and isn't very compatible with large-scale distribution of information.
Google actually developed their own data store that they use for large read-mostly applications such as Google Earth and the search engine's cache. This supports distributing information over a very large number of computers with each piece of information stored on three or four different computers. This allows them to use cheap hardware -- if one computer fails, the others immediately begin restoring all the data it held to the appropriate number of copies
精彩评论