开发者

High-volume logging with batch save to database?

开发者 https://www.devze.com 2023-03-20 13:29 出处:网络
I want to store information开发者_C百科 about requests to my sites in a quick way that doesn\'t put a additional strain on my database. The goal is to use this information to prevent abuse and gather

I want to store information开发者_C百科 about requests to my sites in a quick way that doesn't put a additional strain on my database. The goal is to use this information to prevent abuse and gather information about how users interact with the site (ip, GET/POST, url/action, timestamp).

I'm currently saving a new row on each page request to the database. However, this is wasting resources with an extra database call when the server is also already logging the same information to the nginx log file.

I want to know what I can do to handle this better, I have two ideas I would like to know if there are any better methods.

  1. CRON job to parse access log each day and save as batch transaction to database.
  2. RAM cache (redis/memcached) to store data about request, then CRON to save to database.

However, I'm not sure how to store the data if I use a key-value cache in a way I can retrieve all the records and insert them in a database.

I also don't know how to parse the access log in a way that I won't re-read the entries.

How can I record access attempts in an efficient way?


  1. Use delayed inserts if you're using MySQL (other engines don't need this)
  2. Beware of indexes making write operations expensive
  3. Rotate tables once every minute/hour/day
  4. Beware of over-normalization and foreign keys

A common pattern is having a simple table for plain writes and moving the logs every minute/hour to a main set of tables. The main set can be highly normalized and indexed and a simple de-normalized table (to save space).

Another pattern is to have a simple big table and run a summary query every minute/hour. The simple table can be indexed by date (remember to use a native type).

A final tip, make the architecture and scripts idempotent (if you run it multiple the data is still valid). It's very common to have blips and a simple re-run of a task for a specific window of minute/hour/day can quickly fix everything instead of a massive rebuild.

0

精彩评论

暂无评论...
验证码 换一张
取 消