开发者

How to read and extract information from a file that is being continuously updated?

开发者 https://www.devze.com 2023-01-15 06:58 出处:网络
This is how I am planning to build my utilities for a project : logdump dumps log results to file log. The results are appended to the existing results if the file is already there (like if a new fi

This is how I am planning to build my utilities for a project :

  • logdump dumps log results to file log. The results are appended to the existing results if the file is already there (like if a new file is created every month, the results are appended to the same file for that month).

  • extract reads the log result file to extract relevant results depending on the arguments provided.

  • The thing is that I do not want to wait for logdump to finish writing to log to begin processing it. Also that way I will need to remember till where I already read log to begin extracting more information, which is not what I want to do.

  • I need live results so that whenever something is added to the log results file, extract will get the required results.

  • The processing that extract will do will be generic (will depend on some command line arguments to it), but surely on a line by line basis.

This involves reading a file as and when it is being written to and continuously monitoring it for new updates even after you reach the end of the log 开发者_JS百科file.

How can I do this using C or C++ or shell scripting or Perl?


tail -f will read from a file and monitor it for updates when it reaches EOF instead of quitting outright. It's an easy way to read a log file "live". Could be as simple as:

tail -f log.file | extract

Or maybe tail -n 0 -f so it only prints new lines, not existing lines. Or tail -n +0 -f to display the entire file, and then continue updating thereafter.


The traditional unix tool for this is tail -f, which keeps reading data appended to its argument until you kill it. So you can do

tail -c +1 -f log | extract

In the unix world, reading from continuously appended-to files has come to be known as “tailing”. In Perl, the File::Tail module performs the same task.

use File::Tail;
my $log_file = File::Tail->new("log");
while (defined (my $log_line = $log_file->read)) {
    process_line($log_line);
}


Using a simple stand-in for logdump

#! /usr/bin/perl

use warnings;
use strict;

open my $fh, ">", "log" or die "$0: open: $!";
select $fh;
$| = 1;  # disable buffering

for (1 .. 10) {
  print $fh "message $_\n" or warn "$0: print: $!";
  sleep rand 5;
}

and the skeleton for extract below to get the processing you want. When logfile encounters end-of-file, logfile.eof() is true. Calling logfile.clear() resets all the error state, and then we sleep and try again.

#include <iostream>
#include <fstream>
#include <cerrno>
#include <cstring>
#include <unistd.h>

int main(int argc, char *argv[])
{
  const char *path;
  if      (argc == 2) path = argv[1];
  else if (argc == 1) path = "log";
  else {
    std::cerr << "Usage: " << argv[0] << " [ log-file ]\n";
    return 1;
  }

  std::ifstream logfile(path);
  std::string line;
  next_line: while (std::getline(logfile, line))
    std::cout << argv[0] << ": extracted [" << line << "]\n";

  if (logfile.eof()) {
    sleep(3);
    logfile.clear();
    goto next_line;
  }
  else {
    std::cerr << argv[0] << ": " << path << ": " << std::strerror(errno) << '\n';
    return 1;
  }

  return 0;
}

It's not as interesting as watching it live, but the output is

./extract: extracted [message 1]
./extract: extracted [message 2]
./extract: extracted [message 3]
./extract: extracted [message 4]
./extract: extracted [message 5]
./extract: extracted [message 6]
./extract: extracted [message 7]
./extract: extracted [message 8]
./extract: extracted [message 9]
./extract: extracted [message 10]
^C

I left the interrupt in the output to emphasize that this is an infinite loop.

Use Perl as a glue language to make extract get lines from the log by way of tail:

#! /usr/bin/perl

use warnings;
use strict;

die "Usage: $0 [ log-file ]\n" if @ARGV > 1;
my $path = @ARGV ? shift : "log";

open my $fh, "-|", "tail", "-c", "+1", "-f", $path
  or die "$0: could not start tail: $!";

while (<$fh>) {
  chomp;
  print "$0: extracted [$_]\n";
}

Finally, if you insist on doing the heavy lifting yourself, there's a related Perl FAQ:

How do I do a tail -f in perl?

First try

seek(GWFILE, 0, 1);

The statement seek(GWFILE, 0, 1) doesn't change the current position, but it does clear the end-of-file condition on the handle, so that the next <GWFILE> makes Perl try again to read something.

If that doesn't work (it relies on features of your stdio implementation), then you need something more like this:

for (;;) {
  for ($curpos = tell(GWFILE); <GWFILE>; $curpos = tell(GWFILE)) {
    # search for some stuff and put it into files
  }
  # sleep for a while
  seek(GWFILE, $curpos, 0);  # seek to where we had been
}

If this still doesn't work, look into the clearerr method from IO::Handle, which resets the error and end-of-file states on the handle.

There's also a File::Tail module from CPAN.

0

精彩评论

暂无评论...
验证码 换一张
取 消