开发者

Reading a xml file multithreaded

开发者 https://www.devze.com 2023-03-15 15:22 出处:网络
I\'ve searched a lot but I couldn\'t find a propper solution for my problem. I wrote a xml file containing all episode information of a TV-Show. It\'s 38 kb and contains attributes and strings for abo

I've searched a lot but I couldn't find a propper solution for my problem. I wrote a xml file containing all episode information of a TV-Show. It's 38 kb and contains attributes and strings for about 680 variables. At first I simply read it with the help of XMLTextReader which worked fine with my quadcore. But my wifes five year old laptop took about 30 seconds to read it. So I thought about multithreading but I get an exception because the file is already opened.

Thread start looks like this

while (reader.Read())
{
   ...
   else if (reader.NodeType == XmlNodeTyp开发者_开发技巧e.Element)
   {
       if (reader.Name.Equals("Season1"))
       {
           current.seasonNr = 0;
           current.currentSeason = season[0];
           current.reader = reader;
           seasonThread[0].Start(current);
       }
       else if (reader.Name.Equals("Season2"))
       {
           current.seasonNr = 1;
           current.currentSeason = season[1];
           current.reader = reader;
           seasonThread[1].Start(current);
       }

And the parsing method like this

reader.Read();

for (episodeNr = 0; episodeNr < tmp.currentSeason.episode.Length; episodeNr++)
{
    reader.MoveToFirstAttribute();
    tmp.currentSeason.episode[episodeNr].id = reader.ReadContentAsInt();
    ...
}

But it doesn't work...

I pass the reader because I want the 'cursor' to be in the right position. But I also have no clue if this could work at all.

Please help!

EDIT: Guys where did I wrote about IE?? The program I wrote parses the file. I run it on my PC and on the laptop. No IE at all.

EDIT2: I did some stopwatch research and figured out that parsing the xml file only takes about 200ms on my PC and 800ms on my wifes laptop. Is it WPF beeing so slow? What can I do?


I agree with most everyone's comments. Reading a 38Kb file should not take so long. Do you have something else running on the machine, antivirus / etc, that could be interfering with the processing?

The amount of time it would take you to create a thread will be far greater than the amount of time spent reading the file. If you could post the actual code used to read the file and the file itself, it might help analyze performance bottlenecks.


I think you can't parse XML in multiple threads, at least not in a way that would bring performance benefits, because to read from some point in the file, you need to know everything that comes before it, if nothing else, to know at what level you are.

Your code, if tit worked, would do something like this:

main  season1  season2

read
read
skip   read
skip   read
read
skip             read
skip             read

Note that to do “skip”, you need to fully parse the XML, which means you're doing the same amount of work as before on the main thread. The only difference is that you're doing some additional work on the background threads.

Regarding the slowness, just parsing such a small XML file should be very fast. If it's slow, you're most likely doing something else that is slow, or you're parsing the file multiple times.


If I am understanding how your .xml file is being used, you have essentially created an .xml database.

If correct, I would recommend breaking your Xml into different .xml files, with an indexed .xml document. I would think you can then query - using Linq-2-Xml - a set of .xml data from a specific .xml source.

Of course, this means you will still need to load an .xml file; however, you will be loading significantly smaller files and you would be able to, although highly discouraged, asynchronously load .xml document objects.


Your XML schema doesn't lend itself to parallelism since you seem to have node names (Season1, Season2) that contain the same data but must be parsed individually. You could redesign you schema to have the same node names (i.e. Season) and attributes that express the differences in the data (i.e. Number to indicate the season number). Then you can parallelize i.e. using Linq to XML and PLinq:

XDocument doc = XDocument.Load(@"TVShowSeasons.xml");
var seasonData = doc.Descendants("Season")
                    .AsParallel()
                    .Select(x => new Season()
                    {
                        Number = (int)x.Attribute("Number"),
                        Descripton = x.Value
                    }).ToList();
0

精彩评论

暂无评论...
验证码 换一张
取 消