I'm using a MySql table with the following simple structure :
ID_A : int 8
ID_B : int 8
Primary key : ID_A, ID_B Index : ID_B
This MySQL table contain more than 500 million of rows, and the weight is 20Go.
I need to be able to perform those kind of query :
select *,count(*) as cpt from table group by ID_A order by cpt DESC select *,count(*) as cpt from table group by ID_B order by cpt DESC select * from table where ID_A in (1,2,3,4,5,5) select * from table where ID_B in (1,2,3,4,5,5) select *,count(*) as cpt from table where ID_B in (1,2,3,4,5) group by ID_A order by cpt DESC select *,count(*) as cpt from table where ID_A in (1,2,3,4,5) group by ID_B order by cpt DESC
I tried innodb and MyIsam, but even with a big configuration server, mysql can't answer the Group By query. I even can't do it from a开发者_如何学Python script side, because it would consumme to much memory.
All the data can't fit in RAM (20Go today, but 60Go in a near future).
Should we use NoSql database ? MongoDB ? Map reduce DB ?
Thanks for you help
I've never used MongoDB for large data, but for more than 10,000 keys you can use the mongoDB map/reduce instead of default groupBy.
You can found here mongoDB documentation for this:
mongoDB groupBy for larger grouping operation
Hope this can help
select ,count() as cpt from table group by ID_A order by cpt DESC
err, that will parse - but its very bad programming style, I'm not even sure what it will return.
I suspect that it won't be much faster on a NoSQL system unless you've got a lot of concurrency and can distribute the load across multiple servers (which you could equally do with MySQL). So you'd probably have to look at sharding / map-reduce to parallelize the requests (again implying multiple servers).
Leaving aside the strange SQL, why not just denormalise your data - add tables for ID_A and ID_B counts then put a trigger on the existing table to populate the data into the new tables.
精彩评论