Optimization SQL Query For Analytics_问答_开发者

开发者 https://www.devze.com 2023-01-11 17:28 出处：网络

I have implemented analytics 开发者_运维技巧system which is now performing very poorly. To explain it I need to explain table structure queries

相关专题：

I have implemented analytics 开发者_运维技巧system which is now performing very poorly. To explain it I need to explain table structure queries

I have two innodb tables

Table1: Contains records about hourly stats (stats_id, file_id, time) Table2: Contains over 8 million rows.

Table 2 structure is

full_stats (
    stats_id Int
    file_id Int
    stats_week Int
    stats_month Int
    stats_year Int
    stats_time DATETIME

)

What I am trying to do is to calculate the total views from hourly_stats for a given period of time and grouping records by file_id and then I add/update records to full_stats table. On avg it takes 1-2 mins to process one row. I am trying to optimize the queries for better performance.

Here is what I am doing

There are 60% chances that file_id already exists in full_stats for a given week, month and year and 40% chances are that it doesn't exist.

so in the first query I try to update record using following the query

UPDATE full_stats 
   SET total_views=XXX 
 WHERE stats_week=XX stats_month=X 
   AND stats_year=YYYY

after that I check if affected rows is zero then I insert the record. Once insert or update is done then the record from hourly_stats is removed based on file_id and the given period of time.

Can you give me any suggestion how to optimize queries and reduce the lock rate?

An index causes poor performance, when the index has to be rewritten or updated after every insert/update. This is more likely with regular indexes.
However, in your case it sounds like you'd need an unique index anyway. With this you might not have this problem (that much).

Make sure, that your table uses the InnoDB engine and have an unique index on (stats_year, stats_month, stats_week).

Then, instead of doing an update first, then checking for affected rows and inserting if necessary, use INSERT...ON DUPLICATE KEY UPDATE. This way in 40% of the cases you spared yourself the preceeding update statement.
Note though, that the unique index is crucial for this statement!