开发者

reading in large text files for parsing

开发者 https://www.devze.com 2023-03-15 10:19 出处:网络
I am working with a few text files that range from 1-2 Gig in size. I cannot use the conventional streamreader and decided to read in chuncks and do my work. The problem is that I am not sure when the

I am working with a few text files that range from 1-2 Gig in size. I cannot use the conventional streamreader and decided to read in chuncks and do my work. The problem is that I am not sure when the end of the file is reached since it has been w开发者_Go百科orking on one file for a long time and I am not sure how much larger I can make by buffer to read. Here is the code:

dim Buffer_Size = 30000
dim bufferread = new [Char](Buffer_Size - 1){}
dim bytesread as integer = 0
dim totalbytesread as integer = 0
dim sb as new stringbuilder
Do
   bytesread = inputfile.read(bufferread, 0 , Buffer_Size)
   sb.append(bufferread)
   totalbytesread = bytesread + totalbytesread
   if sb.length > 9999999 then
       data = sb.tostring
       if not data is nothing then
               parsingtools.load(data)
       endif
   endif
   if totalbytesread > 1000000000 then
       logs.constructlog("File almost done")
   endif
loop until inputfile.endofstream

is there any control or code that I can check how much of the file remains?


Have you looked at BufferedStream?

http://msdn.microsoft.com/en-us/library/system.io.bufferedstream%28v=VS.100%29.aspx

You can wrap your stream with that. Also, I'd set the buffer size to megs, not something as small as 30,000.

As far as how much is left? can you just ask the stream for it's length before hand?

Below is a code snippet I use for wrapping a buffered stream around a stream. (sorry it's c#)

    private static void CopyTo(AzureBlobStore azureBlobStore,Stream src, Stream dest, string description)
    {
        if (src == null)
            throw new ArgumentNullException("src");
        if (dest == null)
            throw new ArgumentNullException("dest");

        const int bufferSize = (AzureBlobStore.BufferSizeForStreamTransfers);
        // buffering happening internally. this is just to avoid 4gig boundary and have something to show
        int readCount;
        //long bytesTransfered = 0;
        var buffer = new byte[bufferSize];
        //string totalBytes = FormatBytes(src.Length);
        while ((readCount = src.Read(buffer, 0, buffer.Length)) != 0)
        {
            if (azureBlobStore.CancelProcessing)
            {
                break;
            }
            dest.Write(buffer, 0, readCount);
            //bytesTransfered += readCount;
            //Console.WriteLine("AzureBlobStore:CopyTo:{0}:{1}  {2}", FormatBytes(bytesTransfered), totalBytes,description);
        }
    }

Hope this helps.

0

精彩评论

暂无评论...
验证码 换一张
取 消