开发者

Analysis of XML structured data

开发者 https://www.devze.com 2023-03-19 16:29 出处:网络
My work uses software to fill out records that are expressed as XML documents. I have the task of trawling through these XML files to pull statistics out of them. The files themselves adhere to no sch

My work uses software to fill out records that are expressed as XML documents. I have the task of trawling through these XML files to pull statistics out of them. The files themselves adhere to no schema and if a form field doesn't get filled out then the XML corresponding to that field is not creat开发者_开发百科ed.

What's my best approach?

Example XML:

<Form>
    <Field>
        <id>Field ID</id>
        <value>Entered Value</value>
    </Field>
</Form>

I have been attempting to write software that I can use to query the files but have not been able to come up with anything even remotely useful.

Any thoughts appreciated.

EDIT: In terms of C#, what I would like (Though I'm sure it isn't possible) is a Dictionary that has a string as the key and the corresponding value could EITHER be a string or another Dictionary.


Is like this ↓ ?

XML:

<?xml version="1.0" encoding="utf-8" ?>
<Form>
 <Field>
  <id>People1</id>
  <value>C Sharp</value>
 </Field>
 <Field>
  <id>People2</id>
  <value>C Sharp</value>
 </Field>
 <Field>
   <id>People3</id>
   <value>C</value>
 </Field>

Source:

static void Main(string[] args)
    {
        var doc = XDocument.Load("test.xml");
        var result = from p in doc.Descendants("Form").Descendants("Field")
                     select new { ID = p.Element("id").Value, VALUE = p.Element("value").Value };

        foreach (var x in result)
            Console.WriteLine(x);

        var gr = from p in result
                 group p by p.VALUE into g
                 select new { Language=g.Key , Count=g.Count() };


        foreach (var x in gr)
            Console.WriteLine(string.Format("Language:{0} Count:{1}" , x.Language , x.Count));

        Console.Read();
    }


If the file is not too big, I would suggest perl and the XML::Simple module. This will map the XML to a perl array of hashes, and then you can simply loop through it like normal. Something like:

my $xml = XML::Simple::XmlIn( 'file.xml', force_array => [ 'Form', 'Field' ] );
my %fld_counts;
foreach my $form ( @{$xml->{Form}} )
{
    # Any start record processing...
    foreach my $fld ( @{$form->{Field}} )
    {
        my $id = $fld->{id}
        my $val = $fld->{value}
        # Do something with id/value... like...
        $fld_counts{$id}++;
    }
}

So just adjust that structure based on the stats you want to gather


For parsing XML I prefer using plain XmlReader. Granted, it's more verbose, but it's super efficient and transparent, at least for me. For example:

using(var xr = XmlReader.Create('your stream here'))
    while(xr.Read())
        if(xr.NodeType == XmlNodeType.Element)
            switch(xr.Name) {
                case "brands":
                    // do something here with this element,
                    // like possibly reading the whole subtree...
                    using(var xrr = xr.ReadSubtree())
                        while(xrr.Read()) {
                            // working here...
                        }
                    break;
                case "products":
                    // that is another element
                    break;
                case "some-other-element":
                    // and so on
                    break;
            } // switch
0

精彩评论

暂无评论...
验证码 换一张
取 消