开发者

Return StreamReader to Beginning when his BaseStream has BOM

开发者 https://www.devze.com 2023-03-15 08:35 出处:网络
I\'m looking for an infallible way to reset an StreamReader to beggining, particularly when his underlying BaseStream starts with BOM, but must also work when no BOM is present. Creating a new StreamR

I'm looking for an infallible way to reset an StreamReader to beggining, particularly when his underlying BaseStream starts with BOM, but must also work when no BOM is present. Creating a new StreamReader which reads from the beginning of the stream is also acceptable.

The original StreamReader can be created with any encoding and with detectEncodingFromByteOrderMarks set either to true or false. Also, a read can have been done or not prior calling reset.

The Stream can be random text, and files starting with bytes 0xef,0xbb,0xbf can be files with a BOM or files starting with a valid sequence of characters (for example  if ISO-8859-1 encoding is used), depending on the parameters used when the StreamReader was created.

I've seen other solutions, but they don't work properly when the BaseStream starts with BOM. The StreamReader remembers that it has already detected the BOM, and the first character that is returned when a read is performed is the special BOM character.

Also I can create a new StreamReader, but I can't know if the original StreamReader was created with detectEncodingFromByteOrderMarks set to true or set to false.

This is what I have tried first:

    //fails with TestMethod1
    void ResetStream1(ref StreamReader sr) {
        sr.BaseStream.Position = 0;
        sr.DiscardBufferedData();
    }

    //fails with TestMethod2
    void ResetStream2(ref StreamReader sr) {
        sr.BaseStream.Position = 0;
        sr = new StreamReader(sr.BaseStream, sr.CurrentEncoding, true);
    }

    //fails with TestMethod3
    void ResetStream3(ref StreamReader sr) {
        sr.BaseStream.Position = 0;
        sr = new StreamReader(sr.BaseStream, sr.CurrentEncoding, false);
    }

And those are the thest methods:

    Stream StreamWithBOM = new MemoryStream(new byte[] {0xef,0xbb,0xbf,(byte)'X'});


    [TestMethod]
    public void TestMethod1() {
        StreamReader sr=new StreamReader(StreamWithBOM);
        int before=sr.Read(); //reads X

        ResetStream(ref sr);
        int after=sr.Read();

        Assert.AreEqual(before, after);
    }

    [TestMethod]
    public void TestMethod2() {
        StreamReader sr = new StreamReader(StreamWithBOM,Encoding.GetEncoding("ISO-8859-1"),false);
        int before = sr.Read(); //reads ï

        ResetStream(ref sr);
        int after = sr.Read();

        Assert.AreEqual(before, after);
    }

    [TestMethod]
    public void TestMethod3() {
        StreamReader sr = new StreamReader(StreamWithBOM, Encoding.GetEncoding("ISO-8859-1"), true);
        int expected = (int)'X'; //no Read() done before reset

      开发者_Python百科  ResetStream(ref sr);
        int after = sr.Read();

        Assert.AreEqual(expected, after);
    }

Finally, I found a solution (see my own answer) which passes all 3 tests, but I want to see if a more ellegant or fast solution is possible.


    //pass all 3 tests
    void ResetStream(ref StreamReader sr){
        sr.Read(); //ensure that BOM is detected if configured to do so
        sr.BaseStream.Position=0;
        sr=new StreamReader(sr.BaseStream, sr.CurrentEncoding, false);
    }


This does the trick without needing to create a new StreamReader:

  void ResetStream(StreamReader sr)
  {
      sr.BaseStream.Position = sr.CurrentEncoding.GetPreamble().Length;
      sr.DiscardBufferedData();
  }

GetPreamble() will return an empty byte array if there is no BOM.

This should work with or without the BOM because the UTF8Encoding class (and others, e.g. UTF32Encoding, UnicodeEncoding) has an internal field which tracks whether the BOM is included and is set by the StreamReader when you first do a Read().

However, it seems you need to pass in an Encoding to the StreamReader constructor with the BOM identifier flag turned off, and it will then correctly detect the presence of the BOM. If you just simply pass the stream as the only parameter, as in TestMethod1 above, then for some reason it sets the CurrentEncoding to UTF8 with BOM even if your stream has no BOM. Setting the detectEncodingFromByteOrderMarks to true does not help either, as this defaults to true.

The tests below both pass, because default for UTF8Encoding is to have BOM off.

    Stream StreamWithBOM = new MemoryStream(new byte[] { 0xef, 0xbb, 0xbf, (byte)'X' });
    Stream StreamWithoutBOM = new MemoryStream(new byte[] { (byte)'X' });

    [TestMethod]
    public void TestMethod4()
    {
        StreamReader sr = new StreamReader(StreamWithBOM, new UTF8Encoding());
        int before = sr.Read(); //reads X

        ResetStream(sr);
        int after = sr.Read();

        Assert.AreEqual(before, after);
    }

    [TestMethod]
    public void TestMethod5()
    {
        StreamReader sr = new StreamReader(StreamWithoutBOM, new UTF8Encoding());
        int before = sr.Read(); //reads X

        ResetStream(sr);
        int after = sr.Read();

        Assert.AreEqual(before, after);
    }
0

精彩评论

暂无评论...
验证码 换一张
取 消