开发者

Large data processing technology & books [closed]

开发者 https://www.devze.com 2023-04-06 03:44 出处:网络
It开发者_Python百科's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical andcannot be reasonably answered in its current for
It开发者_Python百科's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 11 years ago.

I am looking for good resources on how to query large volume of data efficiently.

Each data item is represented as many different attributes such as quantity, price, history info, etc. The client will provide different query criteria but without requirement to change the dataset. By simply storing all data into MS SQL is not a good method b/c the scalability of MS SQL is not that good. Here we are targeting many tera byte data and need 200-300 CPU clusters.

I am interested in good resources or books that I can at least do some research.


Did you consider NoSql solution as MongoDb ?


If query speed is not your number one issue you should see if you could build a solution with ROOT, possibly in conjunction with PROOF. In contrast to a NoSql solution you would here trade consistency for some speed.

It is used by the CERN experiments to store and retrieve their experimental data (much more than you require) and if you can find a way to handle the I/O it can be made to scale pretty well.

I have heard it is used by some firms doing quantitative finance.

0

精彩评论

暂无评论...
验证码 换一张
取 消