开发者

.Net, XML, & Regex - How to match a specific collection item?

开发者 https://www.devze.com 2022-12-30 15:24 出处:网络
So I have an xml file with the following simplified xml file contents: <CollectionItems> <CollectionItem>

So I have an xml file with the following simplified xml file contents:

<CollectionItems>
    <CollectionItem>
        <Element1>Value1</Element1>
        <Element2>
            <SubElement1>SubValue1</SubElement1>
            <SubElement2>SubValue2</SubElement2>
            <SubElement3>SubValue3</SubElement3>
        </Element2>
        <Element3>Value3</Element3>
    </CollectionItem>
    <CollectionItem>
        <Element1>Value1</Element1>
        <Element2>
            <SubElement1>SubValue1</SubElement1>
            <SubElement2 />
            <SubElement3>SubValue3</SubElement3>
        </Element2>
        <Element3>Value3</Element3>
    </CollectionItem>
    <CollectionItem>
        <Element1>Value1</Element1>
        <Element2>
            <SubElement1>SubValue1</SubElement1>
            <SubElement2>SubValue2</SubElement2>
            <SubElement3>SubValue3</SubElement3>
        </Element2>
        <Element3>Value3</Element3>
    </CollectionItem>
</CollectionItems>

I am attempting to write a regex in .Net which matches any CollectionItem where SubElement2 is empty (the middle CollectionItem in this example).

I have the following regex so far (SingleLine mode enabled):

<CollectionItem>.+?<SubElement2 />.+?</CollectionItem>

The problem is that it is matching the opening of the first CollectionItem through the close of the second CollectionItem. I understand why it's doing this, but I don't know how to modify the regex to make it match only the center CollectionItem.

Edit: As to why regex as op开发者_JAVA技巧posed to something else:

  1. I was attempting to modify the file in a text editor for simplicity.
  2. After I couldn't figure out how to do it in regex, I wanted to know if it could be done (and how) for the sake of learning.

Thanks!


Why are you trying to use a regular expression? You've got a perfectly good domain model (XML) - why not search that instead? So for example in LINQ to XML:

var collectionsWithEmptySubElement2 =
       document.Descendants("SubElement2")
               .Where(x => x.IsEmpty)
               .Select(x => x.Ancestors("CollectionItem").FirstOrDefault());

or

var collectionsWithEmptySubElement2 =
       document.Descendants("CollectionItem")
               .Where(x => x.Descendants("SubElement2").Any(sub => sub.IsEmpty));


This is XML - why are you trying to do this with Regex? Wouldn't XPath make more sense?


You could use

<CollectionItem>((?!<CollectionItem>).)+?<SubElement2 />.+?</CollectionItem>

This ensures that no further <CollectionItem> comes between the starting tag and the <SubElement2 /> tag.

0

精彩评论

暂无评论...
验证码 换一张
取 消