开发者

Trending 100 million+ rows

开发者 https://www.devze.com 2023-01-07 20:24 出处:网络
I have a system which records some measured values every second. What is the best way to stor开发者_StackOverflow中文版e trend data which are values corresponding to a specific second?

I have a system which records some measured values every second. What is the best way to stor开发者_StackOverflow中文版e trend data which are values corresponding to a specific second?

1 day = 86.400 seconds
1 month = 2.592.000 seconds

Around 1000 values to keep track of every seconds.

Currently there are 50 tables grouping the trend data for 20 columns each. These tables contain more than 100 million rows.

    TREND_TIME datetime (clustered_index)
    TREND_DATA1 real
    TREND_DATA2 real
    ...
    TREND_DATA20 real


Have you considered RRDTool - it provides a round robin database, or circular buffer, for time series data. You can store data at whatever interval you like, then define consolidation points and a consolidation function, for example (sum, min, max, avg) for a given period, 1 second, 5 seconds, 2 days, etc. Because it knows what consolidation points you want, it doesn't need to store all the data points once they've been agregated.

Ganglia and Cacti use this under the covers and it's quite easy to use from many languages.

If you do need all the datapoints, consider using it just for the aggregation.


I would change the data saving approach and instead of saving 'raw' data as values I would save 5-20 minutes of data in an array (Memory, BL side), compress that array using LZ based algorithm and then store the data in the database as binary data. Also, it would be nice to save Max/Min/Avg/etc.. info for that binary chunk.

When you want to process the data you can process the data chunk after chunk and by that you keep a low memory profile for your application. this approach is a little more complex but very scalable in terms of memory/processing.

hope this helps.


Is the problem the database schema?

1 second to many trends obviously first shows you a separate table with a seconds-table foreign key. Alternatively, if the "many trend values" is represented by the columns and not rows you can always append the columns to the seconds table and incur null values.

Have you tried that? Was performance poor?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号