开发者

NoSQL Recommendation for a specific need

开发者 https://www.devze.com 2023-03-19 17:44 出处:网络
I\'m using a MySql table with the following simple structure : ID_A : int 8 ID_B : int 8 Primary key : ID_A, ID_B
  • I'm using a MySql table with the following simple structure :

    ID_A : int 8

    ID_B : int 8

    Primary key : ID_A, ID_B Index : ID_B

  • This MySQL table contain more than 500 million of rows, and the weight is 20Go.

  • I need to be able to perform those kind of query :

    select *,count(*) as cpt from table group by ID_A order by cpt DESC
    
    select *,count(*) as cpt from table group by ID_B order by cpt DESC
    
    select * from table where ID_A in (1,2,3,4,5,5) 
    
    select * from table where ID_B in (1,2,3,4,5,5) 
    
    select *,count(*) as cpt from table where ID_B in (1,2,3,4,5) group by ID_A order by cpt DESC
    
    select *,count(*) as cpt from table where ID_A in (1,2,3,4,5) group by ID_B order by cpt DESC
    
  • I tried innodb and MyIsam, but even with a big configuration server, mysql can't answer the Group By query. I even can't do it from a开发者_如何学Python script side, because it would consumme to much memory.

All the data can't fit in RAM (20Go today, but 60Go in a near future).

Should we use NoSql database ? MongoDB ? Map reduce DB ?

Thanks for you help


I've never used MongoDB for large data, but for more than 10,000 keys you can use the mongoDB map/reduce instead of default groupBy.

You can found here mongoDB documentation for this:

mongoDB groupBy for larger grouping operation

Hope this can help


select ,count() as cpt from table group by ID_A order by cpt DESC

err, that will parse - but its very bad programming style, I'm not even sure what it will return.

I suspect that it won't be much faster on a NoSQL system unless you've got a lot of concurrency and can distribute the load across multiple servers (which you could equally do with MySQL). So you'd probably have to look at sharding / map-reduce to parallelize the requests (again implying multiple servers).

Leaving aside the strange SQL, why not just denormalise your data - add tables for ID_A and ID_B counts then put a trigger on the existing table to populate the data into the new tables.

0

精彩评论

暂无评论...
验证码 换一张
取 消