From this question: What regex pattern do I need for this? I've been using the following code:
function process($node, $replaceRules) {
if($node->hasChildNodes()) {
foreach ($node->childNodes as $childNode) {
if ($childNode instanceof DOMText) {
$text = preg_replace(
array_keys($replaceRules),
array_values($replaceRules),
$childNode->wholeText
);
$node->replaceChi开发者_运维技巧ld(new DOMText($text),$childNode);
} else {
process($childNode, $replaceRules);
}
}
}
}
$replaceRules = array(
'/\b(c|C)olor\b/' => '$1olour',
'/\b(kilom|Kilom|M|m)eter/' => '$1etre',
);
$htmlString = "<p><span style='color:red'>The color of the sky is: gray</p>";
$doc = new DOMDocument();
$doc->loadHtml($htmlString);
process($doc, $replaceRules);
$string = $doc->saveHTML();
echo mb_substr($string,119,-15);
It works fine, but it fails (as the child node is replaced on the first instance) if the html has text and HTML. So it works on
<div>The distance is four kilometers</div>
but not
<div>The distance is four kilometers<br>1000 meters to a kilometer</div>
or
<div>The distance is four kilometers<div class="guide">1000 meters to a kilometer</div></div>
Any ideas of a method that would work on such examples?
Calling $node->replaceChild
will confuse the $node->childNodes
iterator. You can get the child nodes first, and then process them:
function process($node, $replaceRules) {
if($node->hasChildNodes()) {
$nodes = array();
foreach ($node->childNodes as $childNode) {
$nodes[] = $childNode;
}
foreach ($nodes as $childNode) {
if ($childNode instanceof DOMText) {
$text = preg_replace(
array_keys($replaceRules),
array_values($replaceRules),
$childNode->wholeText);
$node->replaceChild(new DOMText($text),$childNode);
}
else {
process($childNode, $replaceRules);
}
}
}
}
精彩评论