Read active log files without needing a temp file_问答_开发者

I am developing a script to read and search log files for a big system at work. It currently works like this:

User defines search parameters
PHP uses shell_exec("tac logfile.log > tmpfile.log") to copy log file (and reverse the lines) to a temp file
PHP fopen()'s the temp file
PHP reads through each line of the file and applies the search parameters
PHP deletes the temp file made in #2

I decided upon this method as the log files are written to every few seconds, and I need to read the log file backwards.

My problem is that step #2 takes a really long time when the log file is >300MB, and each day's log file is easily 500MB, so searching this much data is really time consuming.

I have a couple of questions:

Can I simply run fopen() on the active log file?
Can it cause corruption of the log file if I do simply read it while other scripts are writing to it?
Will it slow down the scripts writing to the log file if I am reading it at the same time?
Can I tell PHP to read开发者_开发百科 each line of the file backwards rather than forwards?

I hope that makes sense...

To answer the numbered questions:

On a Unix or Unix-like system:

1: Feel free to fopen() anything you want. Other processes appending to the file won't be screwed up. Other processes moving the file won't be screwed up. Other processes deleting the file won't be screwed up. Their operations will continue to succeed, as will yours.

2: Reading the file won't prevent writers from writing, unless the filesystem is applying mandatory file locking. This is very unlikely, as it is a large performance penalty. (See your system's fcntl(2) manpage for details on mandatory locking.)

3: The scripts writing to the logs will probably not run slower while you are reading the files. The file data and metadata will need to be brought into memory for writing or reading; the logging might even run (ever so slightly) faster if your search keeps the caches hot.

4: Haven't got a clue. :)

I can think of several possible approaches to solving this problem:

First, you could perform the search first and then reverse the results:

grep danger /var/log/messages | tac

The search won't go any faster, but reversing will, assuming your results are a small-enough subset of the entirety.

Second, you could break your log file into pieces periodically using dd(1)'s skip=BLOCKS parameter; by breaking the file into 10-megabyte chunks, reversing those once, and handling the remainder afresh each time, you can amortize the cost of reversing the file all throughout the day. Programming this correctly would require more effort, but the time savings might be worth it. It depends how often the queries need to be made.

Third, you could write your own moral equivalent of tail(1) -f that is stuffing each new line into a ring-buffer file backwards. You'd be overwriting previous lines as you go, but it sounds like your log rotation already limits the amount of data being returned. A 600 megabyte limit might be reasonable. Your queries could start at the most-recently-written-byte, read through to the end of the file, and then start over at the beginning of the file if not enough results were returned. This would probably require the most effort to get correct, but amortizes the cost of reversing the database through every single log write. (If the scripts writing to the logs are bursty, this might seriously compete with the scripts for disk bandwidth at peak usage.)

Since you are using shell_exec already you might want to use tail in it:

Usage: tail [OPTION]... [FILE]...
Print the last 10 lines of each FILE to standard output. With more than one FILE, precede each with a header giving the file name. With no FILE, or when FILE is -, read standard input.