Read a zipped xml with .NET_问答_开发者_运维开发者技术经验分享

The case: there is a large zipped xml file which need to be parsed by a .NET program. The main issue is the too big size of the file so it can not be loaded fully in th开发者_C百科e memory and unzipped.

The file need to be read part by part in a way that after unzipping this parts they are "consistent". If a part includes only half of a node it will not be possible to be parsed in any xml structure.

Every help will be appreciated. :)

Edit: The current solution extracts the whole zip file part by part and writes it as a xml file on the disk. Then reads and parses the xml. No better ideas so far from my site :).

Using DotNetZip you can do this:

using (var zip = ZipFile.Read("c:\\data\\zipfile.zip"))
{
    using (Stream s = zip["NameOfXmlFile.xml"].OpenReader())
    {
        // Create the XmlReader object.
        using (XmlReader reader = XmlReader.Create(s))
        {
            while (reader.Read()) 
            {
                ....
            }
        }
    }
}

You could give SharpZipLib a try and then to use XmlReader to start parsing it.

Haven't you tried DotNetZip Library (click on this link) ?

In reply to your recent edition::
What you are doing is the standard flow / way ..
As per my knowledge there are no alternatives for this.

Regarding your edit: Unless you actually want to have that xml file on disk(which could of course be the case in some scenarios), I would extract it to a MemoryStream instead.

Hmmm you have two problems here, unzipping the file in a manner that can give you chunks of data and a method to be able to read the XML based on being able to just read chunks at a time. This different to how most of us are used to dealing with XML where we just read it in one time into memory, but you say thats not an option.

This means you are going to have to use Streams which are build for just this case. This solution will work but it might be limited depending on what you are hoping to do with the XML data. You say it needs to be parsed but the only way you will be able to do that (as you can't keep it in memory) is to be able to read it in a "fire hose manner" stepping through each node as its parsed. Hopefull thats enough to be able to pull out what data you need or to process it however you need too (poke it into a DB, extract only the sections you are intested in and save them into a smaller in memory XML doc? etc.)

So first job, get a stream from your zip file, quite easy to do with SharpZipLib (+1 to Rubens). Add a reference to the SharpZipLib dll in your project. Heres some code that creates a stream from a zip and then adds it to a memory stream (you might not want to do that bit but it shows you how I use it to get back a byte[] of data, you just want the stream):

using System;
using System.IO;
using ICSharpCode.SharpZipLib.Zip;
using System.Diagnostics;
using System.Xml;

namespace Offroadcode.Compression
{
    /// <summary>
    /// Number of handy zip functions for compressing/decompressing zip data.
    /// </summary>
    public class Zip
    {

        /// <summary>
        /// Decompresses a btye array of previously compress data from the Compress method or any Zip program for that matter.
        /// </summary>
        /// <param name="bytes">Compress data as a byte array</param>
        /// <returns>byte array of uncompress data</returns>
        public static byte[] Decompress( byte[] bytes ) 
        {
            Debug.Write( "Decompressing byte array of size: " + bytes.Length  );

            using( ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream stream = new ICSharpCode.SharpZipLib.Zip.Compression.Streams.InflaterInputStream( new MemoryStream( bytes ) ) ) 
            {
                                // Left this bit in to show you how I can read from the "stream" and save the data to another stream "mem"
                using ( MemoryStream mem = new MemoryStream() ) 
                {
                    int size = 0;
                    while( true ) 
                    {
                        byte[] buffer = new byte[4096];
                        size = stream.Read( buffer, 0, buffer.Length );

                        if ( size > 0 ) 
                        {
                            mem.Write( buffer, 0, size );
                        }
                        else
                        {
                            break;
                        }
                    }

                    bytes = mem.ToArray();
                }
            }

            Debug.Write( "Complete, decompressed size: " + bytes.Length );

            return bytes;
        }

Then if you follow this article: http://support.microsoft.com/kb/301228 from MS you should be able to merge the two lots of code and start reading your XML from a zip stream :)