NoSQL Recommendation for a specific need_问答_开发者

NoSQL Recommendation for a specific need

开发者 https://www.devze.com 2023-03-19 17:44 出处：网络

I\'m using a MySql table with the following simple structure : ID_A : int 8 ID_B : int 8 Primary key : ID_A, ID_B

相关专题：bigdata nosql

I'm using a MySql table with the following simple structure :

ID_A : int 8

ID_B : int 8

Primary key : ID_A, ID_B Index : ID_B
This MySQL table contain more than 500 million of rows, and the weight is 20Go.

I need to be able to perform those kind of query :

select *,count(*) as cpt from table group by ID_A order by cpt DESC

select *,count(*) as cpt from table group by ID_B order by cpt DESC

select * from table where ID_A in (1,2,3,4,5,5) 

select * from table where ID_B in (1,2,3,4,5,5) 

select *,count(*) as cpt from table where ID_B in (1,2,3,4,5) group by ID_A order by cpt DESC

select *,count(*) as cpt from table where ID_A in (1,2,3,4,5) group by ID_B order by cpt DESC

I tried innodb and MyIsam, but even with a big configuration server, mysql can't answer the Group By query. I even can't do it from a开发者_如何学Python script side, because it would consumme to much memory.

All the data can't fit in RAM (20Go today, but 60Go in a near future).

Should we use NoSql database ? MongoDB ? Map reduce DB ?

Thanks for you help

I've never used MongoDB for large data, but for more than 10,000 keys you can use the mongoDB map/reduce instead of default groupBy.

You can found here mongoDB documentation for this:

mongoDB groupBy for larger grouping operation

Hope this can help

select ,count() as cpt from table group by ID_A order by cpt DESC

err, that will parse - but its very bad programming style, I'm not even sure what it will return.

I suspect that it won't be much faster on a NoSQL system unless you've got a lot of concurrency and can distribute the load across multiple servers (which you could equally do with MySQL). So you'd probably have to look at sharding / map-reduce to parallelize the requests (again implying multiple servers).

Leaving aside the strange SQL, why not just denormalise your data - add tables for ID_A and ID_B counts then put a trigger on the existing table to populate the data into the new tables.