I'm working on using htmlpurifier to create a text-only version of my site. I now need to replace all the a hrefs with the text only url i.e. 'www.example.com/aboutus' becomes 'www.example.com/text/aboutus'
Initially I tried a simple str_replace on the domain (I use a global variable for the domain), but the problem is links to files also get replaced i.e. 'www.example.com/document.pdf' becomes 'www.example.com/text/document.pdf' and therefore fails.
Is there a regular expression where I can say replace domain w开发者_如何转开发ith domain/text where the url does not include string?
Thanks for any pointers you might be able to give me :)
Use a negative lookahead:
$output = preg_replace(
'#www.example.com(?!/text/)#',
'www.example.com/text',
$input
);
Better yet, use DOM with it:
$html = '<a href="www.example.com/something">foo</a>
<p>hello</p>
<a href="www.example.com/text/documents">bar</a>';
libxml_use_internal_errors(true); // supresses DOM errors
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hrefs = $xpath->query('//a/@href');
foreach ($hrefs as $href) {
$href->value = preg_replace(
'#^www.example.com(?!/text/)(.*?)(?<!\.pdf)$#',
'www.example.com/text\\1',
$href->value
);
}
This should give you:
<a href="www.example.com/text/something">foo</a>
<p>hello</p>
<a href="www.example.com/text/documents">bar</a>
精彩评论