开发者

Wrap bold/strong tags around first occurrence of keyword inside content fragment?

开发者 https://www.devze.com 2023-02-07 07:04 出处:网络
I\'m looking for the simplest way to wrap bold tags around the first appearance of a predefined keyword phrase, when that phrase does not appear in a heading tag or as an html attribute value. After t

I'm looking for the simplest way to wrap bold tags around the first appearance of a predefined keyword phrase, when that phrase does not appear in a heading tag or as an html attribute value. After the first match is found, exits the routine.

For example, if the keyword is "blue widgets", and the content was:

blue widgets and accessories for blue widgets can be found here

Then after the routine filters the content, it would return:

<b>blue widgets&开发者_JS百科lt;/b> and accessories for blue widgets can be found here

However, if the first occurrence of the word "blue widgets" were in an attribute or a heading tag, it would skip over those and go to the next one. For example,

<img src="foo.png" title="A site about blue widgets" alt="blue-widget" />
<h2>This is a site about blue widgets</h2>
<p>We've got lots of blue widgets and blue widget accessories...

In the above content, only the appearance of the keyword in the sentence "We've got lots of blue widgets and blue widget accessories"... would be bolded.

Can someone give me an example of how this can be done?


If you're still thinking about using a regex, check this out:

$source = <<<EOS
<img src="foo.png" title="A site about blue widgets" alt="blue-widget" />
<h2>This is a site about blue widgets</h2>
<p>We've got lots of blue widgets and blue widget accessories...';
EOS;

$term = 'blue widgets';

// convert search term to valid regex
$term0 = preg_replace(array('~\A\b~', '~\b\z~', '~\s+~'), 
                      array('\b', '\b', '\s+'),
                      preg_quote(trim($term), '~'));

$regex = <<<EOR
~\A   # anchoring at string start ensures only one match can occur
(?>
   <(h[1-6])[^>]*>.*?</\\1>   # a complete h<n> element
 | </?\w+[^>]*+>              # any other tag
 | (?:(?!<|{$term0}).)*+      # anything else, but stop before '<' or the search term
)*+
\K    # pretend the match really started here; only the next part gets replaced
{$term0}
~isx
EOR;

echo preg_replace($regex, "<strong>$0</strong>", $source);

run it on ideone.com

I wasn't even sure it was possible to do this with a regex, which is why I went to the trouble of working it out. Hideous as this solution is, it's about as simple as I could make it. And to do that I had to ignore many factors that can break it--things like CDATA sections, SGML comments, <script> elements, and angle brackets in attribute values, to name a few. And that's just in valid HTML.

Fun as this was, I hope it persuades you once and for all to forget about regexes and use a dedicated tool, as the other responders advised.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号