开发者

how to improve the performance of reading bytes line by line from a filestream

开发者 https://www.devze.com 2023-03-19 13:03 出处:网络
I have a file bigger than 10G. To read this file line by line, I wrote this function. static IEnumerable<string> fread(string fname, Encoding enc)

I have a file bigger than 10G. To read this file line by line, I wrote this function.

static IEnumerable<string> fread(string fname, Encoding enc) 
  using (var f = File.OpenRead(fname))
  using (var reader = new StreamReader(f, enc))
    while (!reader.EndOfStream)
      yield return reader.ReadLine();     
}

This code works pretty well, but it returns a line string, not a line byte[]. So to return byte[] for eac开发者_如何学Goh line, I wrote another function.

static IEnumerable<byte[]> freadbytes(string fname) {
  using (var f = File.OpenRead(fname)) {
    var bufSz = 1024;
    var buf = new byte[bufSz];
    var read = 1;
    var cr = (byte)13; // \r
    var lf = (byte)10; // \n
    var data = new List<byte>();
    while (read > 0) {
      read = f.Read(buf, 0, bufSz);
      data.AddRange(read == bufSz ? buf : buf.slc(0, read));
      var i = data.IndexOf(lf);
      while (i >= 0) {
        if (i > 0 && data[i - 1] == cr) yield return data.Take(i - 1).ToArray();
        else yield return data.Take(i).ToArray();
        data.RemoveRange(0, i + 1);
        i = data.IndexOf(lf);
      }
    }
  }
}

The second function, freadbytes(), also works well, but the problem is that the second function takes more than 10 times of the first function. To make the second function faster, what can I do?


Although untested, I'm sure this will be considerably faster:

static IEnumerable<byte[]> fread(string fname, Encoding enc) 
{
  using (var f = File.OpenRead(fname))
  using (var reader = new StreamReader(f, enc))
    while (!reader.EndOfStream)
      yield return enc.GetBytes(reader.ReadLine());     
}


Maybe this will help:

static IEnumerable<string> fread(string fname, Encoding enc) 
  using (var f = File.OpenRead(fname))
  using (var reader = new StreamReader(f, enc))
    while (!reader.EndOfStream)
      yield return enc.GetBytes(reader.ReadLine());     
}

Update: Had missed the enc param initially.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号