I cannot find a specific question like this so I'm posting. Hopefully, this will be of general use.
I have a file that includes XML tags of "<w:t> data data.....</w:t>
". There is a lot of other stuff too. I need to capture everything within (and including) the <w:t></w:t>
tags.
I'd appreciate hearing suggestions on how to proceed.
开发者_如何学GoThanks in advance..
David
You should really use a XML DOM parser like SimpleXML:
$string = '<?xml version="1.0"?>
<root xmlns:w="http://example.com/">
<w:t>some data...</w:t>
<not-captured>data data</not-captured>
<w:t>more data...</w:t>
</root>';
$doc = simplexml_load_string($string);
foreach ($doc->xpath('//w:t') as $elem) {
var_dump($elem->asXML());
}
If you do not specify the namespace for w in your XML document, use SimpleXMLElement::registerXPathNamespace:
$doc->registerXPathNamespace('w', 'http://example.com/');
Adding to the previous answer, I would include an 's' in lower-case after the i in the end to take care of line breaks.
Good point by Mr. Gumbo below. Yes do also add a 'U' in upper-case after the 's' to make the expression less greedy otherwise it won't work as expected
e.g.
preg_match_all('/.*<\/w\:t>/isU', $string, $matches);
Using DomXml is preferred option since it does not restrict you to searching for other tags/data.
But using regular expressions makes far less code so I would go for preg_match_all if those tags are only thing you need.
$string = '<?xml version="1.0"?>
<root>
<w:t>some data...</w:t>
<not-captured>data data</not-captured>
<w:t>more data...</w:t>
</root>
</xml>';
preg_match_all('/<w\:t>.*<\/w\:t>/is', $string, $matches);
var_dump($matches);
response:
array(1) {
[0]=>
array(2) {
[0]=>
string(23) "<w:t>some data...</w:t>"
[1]=>
string(23) "<w:t>more data...</w:t>"
}
}
Edit: /is modifier added to regex
精彩评论