i want to process a text file line by line. In the olden days i loaded the file into a StringList
:
slFile := TStringList.Create();
slFile.LoadFromFile(filename);
for i := 0 to slFile.Count-1 do
begin
oneLine := slFile.Strings[i];
//process the line
end;
Problem with that is once the file gets to be a few hundred megabytes, i have to allocate a huge开发者_Python百科 chunk of memory; when really i only need enough memory to hold one line at a time. (Plus, you can't really indicate progress when you the system is locked up loading the file in step 1).
The i tried using the native, and recommended, file I/O routines provided by Delphi:
var
f: TextFile;
begin
Reset(f, filename);
while ReadLn(f, oneLine) do
begin
//process the line
end;
Problem withAssign
is that there is no option to read the file without locking (i.e. fmShareDenyNone
). The former stringlist
example doesn't support no-lock either, unless you change it to LoadFromStream
:
slFile := TStringList.Create;
stream := TFileStream.Create(filename, fmOpenRead or fmShareDenyNone);
slFile.LoadFromStream(stream);
stream.Free;
for i := 0 to slFile.Count-1 do
begin
oneLine := slFile.Strings[i];
//process the line
end;
So now even though i've gained no locks being held, i'm back to loading the entire file into memory.
Is there some alternative to Assign
/ReadLn
, where i can read a file line-by-line, without taking a sharing lock?
i'd rather not get directly into Win32 CreateFile
/ReadFile
, and having to deal with allocating buffers and detecting CR
, LF
, CRLF
's.
i thought about memory mapped files, but there's the difficulty if the entire file doesn't fit (map) into virtual memory, and having to maps views (pieces) of the file at a time. Starts to get ugly.
i just want Reset
with fmShareDenyNone
!
With recent Delphi versions, you can use TStreamReader
. Construct it with your file stream, and then call its ReadLine
method (inherited from TTextReader
).
An option for all Delphi versions is to use Peter Below's StreamIO unit, which gives you AssignStream
. It works just like AssignFile
, but for streams instead of file names. Once you've used that function to associate a stream with a TextFile
variable, you can call ReadLn
and the other I/O functions on it just like any other file.
You can use this sample code:
TTextStream = class(TObject)
private
FHost: TStream;
FOffset,FSize: Integer;
FBuffer: array[0..1023] of Char;
FEOF: Boolean;
function FillBuffer: Boolean;
protected
property Host: TStream read FHost;
public
constructor Create(AHost: TStream);
destructor Destroy; override;
function ReadLn: string; overload;
function ReadLn(out Data: string): Boolean; overload;
property EOF: Boolean read FEOF;
property HostStream: TStream read FHost;
property Offset: Integer read FOffset write FOffset;
end;
{ TTextStream }
constructor TTextStream.Create(AHost: TStream);
begin
FHost := AHost;
FillBuffer;
end;
destructor TTextStream.Destroy;
begin
FHost.Free;
inherited Destroy;
end;
function TTextStream.FillBuffer: Boolean;
begin
FOffset := 0;
FSize := FHost.Read(FBuffer,SizeOf(FBuffer));
Result := FSize > 0;
FEOF := Result;
end;
function TTextStream.ReadLn(out Data: string): Boolean;
var
Len, Start: Integer;
EOLChar: Char;
begin
Data:='';
Result:=False;
repeat
if FOffset>=FSize then
if not FillBuffer then
Exit; // no more data to read from stream -> exit
Result:=True;
Start:=FOffset;
while (FOffset<FSize) and (not (FBuffer[FOffset] in [#13,#10])) do
Inc(FOffset);
Len:=FOffset-Start;
if Len>0 then begin
SetLength(Data,Length(Data)+Len);
Move(FBuffer[Start],Data[Succ(Length(Data)-Len)],Len);
end else
Data:='';
until FOffset<>FSize; // EOL char found
EOLChar:=FBuffer[FOffset];
Inc(FOffset);
if (FOffset=FSize) then
if not FillBuffer then
Exit;
if FBuffer[FOffset] in ([#13,#10]-[EOLChar]) then begin
Inc(FOffset);
if (FOffset=FSize) then
FillBuffer;
end;
end;
function TTextStream.ReadLn: string;
begin
ReadLn(Result);
end;
Usage:
procedure ReadFileByLine(Filename: string);
var
sLine: string;
tsFile: TTextStream;
begin
tsFile := TTextStream.Create(TFileStream.Create(Filename, fmOpenRead or fmShareDenyWrite));
try
while tsFile.ReadLn(sLine) do
begin
//sLine is your line
end;
finally
tsFile.Free;
end;
end;
If you need support for ansi and Unicode in older Delphis, you can use my GpTextFile or GpTextStream.
As it seems the FileMode variable is not valid for Textfiles, but my tests showed that multiple reading from the file is no problem. You didn't mention it in your question, but if you are not going to write to the textfile while it is read you should be good.
What I do is use a TFileStream but I buffer the input into fairly large blocks (e.g. a few megabytes each) and read and process one block at a time. That way I don't have to load the whole file at once.
It works quite quickly that way, even for large files.
I do have a progress indicator. As I load each block, I increment it by the fraction of the file that has additionally been loaded.
Reading one line at a time, without something to do your buffering, is simply too slow for large files.
I had same problem a few years ago especially the problem of locking the file. What I did was use the low level readfile from the shellapi. I know the question is old since my answer (2 years) but perhaps my contribution could help someone in the future.
const
BUFF_SIZE = $8000;
var
dwread:LongWord;
hFile: THandle;
datafile : array [0..BUFF_SIZE-1] of char;
hFile := createfile(PChar(filename)), GENERIC_READ, FILE_SHARE_READ or FILE_SHARE_WRITE, nil, OPEN_EXISTING, FILE_ATTRIBUTE_READONLY, 0);
SetFilePointer(hFile, 0, nil, FILE_BEGIN);
myEOF := false;
try
Readfile(hFile, datafile, BUFF_SIZE, dwread, nil);
while (dwread > 0) and (not myEOF) do
begin
if dwread = BUFF_SIZE then
begin
apos := LastDelimiter(#10#13, datafile);
if apos = BUFF_SIZE then inc(apos);
SetFilePointer(hFile, aPos-BUFF_SIZE, nil, FILE_CURRENT);
end
else myEOF := true;
Readfile(hFile, datafile, BUFF_SIZE, dwread, nil);
end;
finally
closehandle(hFile);
end;
For me the speed improvement appeared to be significant.
Why not simply read the lines of the file directly from the TFileStream itself one at a time ?
i.e. (in pseudocode):
readline:
while NOT EOF and (readchar <> EOL) do
appendchar to result
while NOT EOF do
begin
s := readline
process s
end;
One problem you may find with this is that iirc TFileStream is not buffered so performance over a large file is going to be sub-optimal. However, there are a number of solutions to the problem of non-buffered streams, including this one, that you may wish to investigate if this approach solves your initial problem.
精彩评论