I want to change words from list (in file or database) into links in HTML website. I used str_replace, but i have problem with replacing words, that are already in links auchor.
eg. I have html like this:
Lorem ipsum donor et simet <a>lorem ipsum</a> eta raoa talkesa z uta.
An i want to replace all "ip开发者_StackOverflowsum" into links, but skip ipsum in [a]lorem ipsum[a]. I don't know, maby preg_replace?
So my understanding is that you have a list of words, that need to be linked within a body of HTML. str_replace() handles it, but not for links already within anchors?
You wish to ignore matching words if they are within the anchor tags?
PHP does not support variable width negative lookbehind, so it is not readily possible to say don't match where there is an anchor tag proceeding the matched word, as the head anchor tag is variable length.
The way I handle this sort of issue is to replace all of them, then undo the changes that should not have been made.
<?php
// Setup data
$words = array('lorem' => 'www.google.com',
'ipsum' => 'www.bbc.co.uk',
'test' => 'www.amazon.co.uk');
$textBody = '<p>This is a short test of <a href="www.slashdot.org">lorem ipsum</a> automatic anchoring. Let us see if it works, any incidences of lorem or ipsum, should be caught.</p>';
// Make basic replacements, but use a different tag than anchor
// so it can be detected separately from previously existing anchors
// I am using the <argh> tag
$wordExpressions = array();
$wordReplacements = array();
foreach ($words as $cWord => $cLink) {
$wordExpressions[] = '#' . preg_quote($cWord) . '#';
$wordReplacements[] = '<argh href="' . $cLink . '">' . $cWord . '</argh>';
}
$replacedText = preg_replace($wordExpressions, $wordReplacements, $textBody);
// At the moment, there are nested anchors
echo $replacedText;
// Use a fairly horrific recursive anchor tag callback replacement to delete any
// <argh> tags inside <a> tags
$replacedText =
preg_replace_callback("#(<a [^>]*>)((?:[^<]|<(?!/?a>)|(?R))+)(</a>)#",
create_function('$a', 'return $a[1] . preg_replace("#<argh[^>]*>(.*?)</argh>#", "$1", $a[2]) . $a[3];'),
$replacedText);
// No nested anchors now
echo $replacedText;
// Finally replace the <argh> tags with straight <a>s
$replacedText = preg_replace(array('#<argh #', '#</argh>#'), array('<a ', '</a>'), $replacedText);
// The output should now be correct
echo $replacedText;
?>
This looks a bit worse than it is, especially the recursive regex callback. All that does is match paired anchor tags and pass the match to a function that simple returns the patched pair and strips the new tags from the interior content. There is a good discussion on the use of recursive replacements in "Mastering Regular Expressions" by Jeffery Friedl.
The tag could be anything, I used that word as it is highly unlikely to exist in HTML and seemed appropriate to the problem at hand. :-)
Does something like that work for you?
精彩评论