XmlReader to read from fixed length buffer_问答_开发者

The incoming stream comes in a fixed 1024 bytes buffer, the stream itself is a hug XML file which may take several rounds of read to finish. My goal is to read the buffer and figure out how many times an element occured in the big XML file.

My chanllenge is, since it is really a fixed length buffer, so it cannot gurant开发者_如何学运维ee the wellform of XML, if I wrap the stream in the XmlTextReader, I am always getting exception and cannot finish the read. for example, the element could be abcdef, while the 1st buffer could end at abc while second buffer start with def. I am really frustrated about this, anyone could advise a better way to archieve this using streaming fashion? (I do not want to load entire content in memory)

Thanks so much

Are your 1024-byte buffers coming from one of the standard, concrete implementations of System.IO.Stream? If t they are, you can just create your XmlTextReader around the base stream:

XmlTextReader tr = XmlTextReader.Create( myStreamInstance ) ;

If not -- say, for instance, you're "reading" the buffers from some sort of API -- you need to implement your own concrete Stream, something along these lines (all you should need to do is flesh out the ReadNextFrame() method and possibly implement your constructors):

public class MyStream : System.IO.Stream
{
    public override bool CanRead  { get { return true  ; } }
    public override bool CanSeek  { get { return false ; } }
    public override bool CanWrite { get { return false ; } }
    public override long Length   { get { throw new NotImplementedException(); } }
    public override long Position {
                                    get { throw new NotImplementedException(); }
                                    set { throw new NotImplementedException(); }
                                  }

    public override int Read( byte[] buffer , int offset , int count )
    {
        int bytesRead = 0 ;

        if ( !initialized )
        {
            Initialize() ;
        }

        for ( int bytesRemaining = count ; !atEOF && bytesRemaining > 0 ; )
        {

            int frameRemaining = frameLength - frameOffset ;
            int chunkSize      = ( bytesRemaining > frameRemaining ? frameRemaining : bytesRemaining ) ;

            Array.Copy( frame , offset , frame , frameOffset , chunkSize ) ;

            bytesRemaining -= chunkSize ;
            offset         += chunkSize ;
            bytesRead      += chunkSize ;

            // read next frame if necessary
            if ( frameOffset >= frameLength )
            {
                ReadNextFrame() ;
            }

        }

        return bytesRead ;
    }

    public override long Seek( long offset , System.IO.SeekOrigin origin ) { throw new NotImplementedException(); }
    public override void SetLength( long value )                           { throw new NotImplementedException(); }
    public override void Write( byte[] buffer , int offset , int count )   { throw new NotImplementedException(); }
    public override void Flush()                                           { throw new NotImplementedException(); }

    private byte[] frame       = null  ;
    private int    frameLength = 0     ;
    private int    frameOffset = 0     ;
    private bool   atEOF       = false ;
    private bool   initialized = false ;

    private void Initialize()
    {
        if ( initialized ) throw new InvalidOperationException() ;

        frame       = new byte[1024] ;
        frameLength = 0 ;
        frameOffset = 0 ;
        atEOF       = false ;
        initialized = true ;

        ReadNextFrame() ;

        return ;
    }

    private void ReadNextFrame()
    {

        //TODO: read the next (or first 1024-byte buffer
        //TODO: set the frame length to the number of bytes actually returned (might be less than 1024 on the last read, right?
        //TODO: set the frame offset to 0
        //TODO: set the atEOF flag if we've exhausted the data source ;

        return ;

    }

}

Then instantiate your XmlReader as above:

System.IO.Stream     s  = new MyStream() ;
System.Xml.XmlReader xr = XmlTextReader.Create( s ) ;

Cheers!

That is sort of strange goal... Usually it is more like "count elements but not load whole XML to memory" which is trivial - write Stream derived class that represents you buffer as forward only stream (similar to NetworkStream) and read XML (i.e. using LINQ) normally using XmlReader, but do not construct XmlDocument.

If you clarify your goal it may be easier for others to advise.