So I have an xml file with the following simplified xml file contents:
<CollectionItems>
<CollectionItem>
<Element1>Value1</Element1>
<Element2>
<SubElement1>SubValue1</SubElement1>
<SubElement2>SubValue2</SubElement2>
<SubElement3>SubValue3</SubElement3>
</Element2>
<Element3>Value3</Element3>
</CollectionItem>
<CollectionItem>
<Element1>Value1</Element1>
<Element2>
<SubElement1>SubValue1</SubElement1>
<SubElement2 />
<SubElement3>SubValue3</SubElement3>
</Element2>
<Element3>Value3</Element3>
</CollectionItem>
<CollectionItem>
<Element1>Value1</Element1>
<Element2>
<SubElement1>SubValue1</SubElement1>
<SubElement2>SubValue2</SubElement2>
<SubElement3>SubValue3</SubElement3>
</Element2>
<Element3>Value3</Element3>
</CollectionItem>
</CollectionItems>
I am attempting to write a regex in .Net which matches any CollectionItem where SubElement2 is empty (the middle CollectionItem in this example).
I have the following regex so far (SingleLine mode enabled):
<CollectionItem>.+?<SubElement2 />.+?</CollectionItem>
The problem is that it is matching the opening of the first CollectionItem through the close of the second CollectionItem. I understand why it's doing this, but I don't know how to modify the regex to make it match only the center CollectionItem.
Edit: As to why regex as op开发者_JAVA技巧posed to something else:
- I was attempting to modify the file in a text editor for simplicity.
- After I couldn't figure out how to do it in regex, I wanted to know if it could be done (and how) for the sake of learning.
Thanks!
Why are you trying to use a regular expression? You've got a perfectly good domain model (XML) - why not search that instead? So for example in LINQ to XML:
var collectionsWithEmptySubElement2 =
document.Descendants("SubElement2")
.Where(x => x.IsEmpty)
.Select(x => x.Ancestors("CollectionItem").FirstOrDefault());
or
var collectionsWithEmptySubElement2 =
document.Descendants("CollectionItem")
.Where(x => x.Descendants("SubElement2").Any(sub => sub.IsEmpty));
This is XML - why are you trying to do this with Regex? Wouldn't XPath make more sense?
You could use
<CollectionItem>((?!<CollectionItem>).)+?<SubElement2 />.+?</CollectionItem>
This ensures that no further <CollectionItem>
comes between the starting tag and the <SubElement2 />
tag.
精彩评论