I'm using Delphi 2007 and have an application that read log-files from several places over an internal network and display exceptions. Those directories sometimes contains thousands of log-files. The application have an option to read only log-files from the n latest days bit it can also be in any datetime range.
The problem is that the first time the log directory is read it can be very slow (several minutes). The second time it is considerably faster.
I wonder how I can optimize my code to read the log-files as fast as possible ? I'm using a vCurrentFile: TStringList to store the file in memory. That is updated from a FileStream as I think this is faster.
Here is some code:
Refresh: The main loop to read log-files
// In this function the logfiles are searched for exceptions. If a exception is found it is stored in a object.
// The exceptions are then shown in the grid
procedure TfrmMain.Refresh;
var
FileData : TSearchRec; // Used for the file searching. Contains data of the file
vPos, i, PathIndex : Integer;
vCurrentFile: TStringList;
vDate: TDateTime;
vFileStream: TFileStream;
begin
tvMain.DataController.RecordCount := 0;
vCurrentFile := TStringList.Create;
memCallStack.Clear;
try
for PathIndex := 0 to fPathList.Count - 1 do // Loop 0. This loops until all directories are searched through
begin
if (FindFirst (fPathList[PathIndex] + '\*.log', faAnyFile, FileData) = 0) then
repeat // Loop 1. This loops while there are .log files in Folder (CurrentPath)
vDate := FileDateToDateTime(FileData.Time);
if chkLogNames.Items[PathIndex].Checked and FileDateInInterval(vDate) then
begin
tvMain.BeginUpdate; // To speed up the grid - delays the guichange until EndUpdate
fPathPlusFile := fPathList[PathIndex] + '\' + FileData.Name;
vFileStream := TFileStream.Create(fPathPlusFile, fmShareDenyNone);
vCurrentFile.LoadFromStream(vFileStream);
fUser := FindDataInRow(vCurrentFile[0], 'User'); // FindData Returns the string after 'User' until ' '
fComputer := FindDataInRow(vCurrentFile[0], 'Computer'); // FindData Returns the string after 'Computer' until ' '
Application.ProcessMessages; // Give some priority to the User Interface
if not CancelForm.IsCanceled then
begin
if rdException.Checked then
for i := 0 to vCurrentFile.Count - 1 do
begin
vPos := AnsiPos(MainExceptionToFind, vCurrentFile[i]);
if vPos > 0 then
UpdateView(vCurrentFile[i], i, MainException);
vPos := AnsiPos(ExceptionHintToFind, vCurrentFile[i]);
if vPos > 0 then
UpdateView(vCurrentFile[i], i, HintException);
end
else if rdOtherText.Checked then
for i := 0 to vCurrentFile.Count - 1 do
begin
vPos := AnsiPos(txtTextToSearch.Text, vCurrentFile[i]);
if vPos > 0 then
UpdateView(vCurrentFile[i], i, TextSearch)
end
end;
vFileStream.Destroy;
tvMain.EndUpdate; // Now the Gui can be updated
end;
until(FindNext(FileData) <> 0) or (CancelForm.IsCanceled); // End Loop 1
end; // End Loop 0
finally
FreeAndNil(vCurrentFile);
end;
end;
UpdateView method: Add one row to the displaygrid
{: Update the grid with one exception}
procedure TfrmMain.UpdateView(aLine: string; const aIndex, aType: Integer);
var
vExceptionText: String;
vDate: TDateTime;
begin
if ExceptionDateInInterval(aLine, vDate) then // Parse the date from the beginning of date
begin
if aType = MainException then
vExceptionText := 'Exception'
else if aType = HintException then
vExceptionText := 'Exception Hint'
else if aType = TextSearch then
vExceptionText := 'Text Search';
SetRow(aIndex, vDate, ExtractFilePath(fPathPlusFile), ExtractFileName(fPathPlusFile), fUser, fComputer, aLine, vExceptionText);
end;
end;
Method to decide if row is in daterange:
{: This compare exact exception time against the filters
@desc 2 cases: 1. Last n days
2. From - to range}
function TfrmMain.ExceptionDateInInterval(var aTestLine: String; out aDateTime: TDateTime): Boolean;
var
vtmpDate, vTmpTime: String;
vDate, vTime: TDateTime;
vIndex: Integer;
begin
aDateTime := 0;
vtmpDate := Copy(aTestLine, 0, 8);
vT开发者_StackOverflow中文版mpTime := Copy(aTestLine, 10, 9);
Insert(DateSeparator, vtmpDate, 5);
Insert(DateSeparator, vtmpDate, 8);
if TryStrToDate(vtmpDate, vDate, fFormatSetting) and TryStrToTime(vTmpTime, vTime) then
aDateTime := vDate + vTime;
Result := (rdLatest.Checked and (aDateTime >= (Now - spnDateLast.Value))) or
(rdInterval.Checked and (aDateTime>= dtpDateFrom.Date) and (aDateTime <= dtpDateTo.Date));
if Result then
begin
vIndex := AnsiPos(']', aTestLine);
if vIndex > 0 then
Delete(aTestLine, 1, vIndex + 1);
end;
end;
Test if the filedate is inside range:
{: Returns true if the logtime is within filters range
@desc Purpose is to sort out logfiles that are no idea to parse (wrong day)}
function TfrmMain.FileDateInInterval(aFileTimeStamp: TDate): Boolean;
begin
Result := (rdLatest.Checked and (Int(aFileTimeStamp) >= Int(Now - spnDateLast.Value))) or
(rdInterval.Checked and (Int(aFileTimeStamp) >= Int(dtpDateFrom.Date)) and (Int(aFileTimeStamp) <= Int(dtpDateTo.Date)));
end;
The problem is not your logic, but the underlying file system.
Most file systems get very slow when you put many files in a directory. This is very bad with FAT, but NTFS also suffers from it, especially if you have thousands of files in a directory.
The best you can do is reorganize those directory structures, for instance by age.
Then have at most a couple of 100 files in each directory.
--jeroen
How fast do you want to be? If you want to be really fast, then you need to use something besides windows networking to read the files. The reason is if you want to read the last line of a log file (or all the lines since the last time you read it) then you need to read the whole file again.
In your question you said the problem is that it is slow to enumerate your directory listing. That is your first bottleneck. If you want to be real fast then you need to either switch to HTTP or add some sort of log server to the machine where the log files are stored.
The advantage of using HTTP is you can do a range request and just get the new lines of the log file that were added since you last requested it. That will really improve performance since you are transferring less data (especially if you enable HTTP compression) and you also have less data to process on the client side.
If you add a log server of some sort, then that server can do the processing on the server side, where it has native access to the data, and only return the rows that are in the date range. A simple way of doing that may be to just put your logs into a SQL database of some sort, and then run queries against it.
So, how fast do you want to go?
精彩评论