开发者

How to replace text over multiple lines using preg_replace

开发者 https://www.devze.com 2022-12-17 18:44 出处:网络
Hi have the following content within an html page that stretches multiple lines <div class=\"c-fc c-bc\" id=\"content\">

Hi have the following content within an html page that stretches multiple lines

<div class="c-fc c-bc" id="content">
               开发者_如何学编程 <span class="content-heading c-hc">Heading 1 </span><br />
                The Home Page must provide a introduction to the services provided.<br />
                <br />
                <span class="c-sc">Sub Heading</span><br />
                The Home Page must provide a introduction to the services provided.<br />
                <br />
                <span class="c-sc">Sub Heading</span><br /> 
                The Home Page must provide a introduction to the services provided.<br />
            </div>

I need to replace everthing between <div class="c-fc c-bc" id="content"> and </div> with custom text

I use the following code to accomplish this but it does not want to work if it's multiple lines, but works if evertinh is in one line

$body = file_get_contents('../../templates/'.$val['url']);

$body = preg_replace('/<div class=\"c\-fc c\-bc\" id=\"content\">(.*)<\/div>/','<div class="c-fc c-bc" id="content">abc</div>',$body);

Am I missing something?


If this weren't HTML, I'd tell you to use the DOTALL modifier to change the meaning of . from 'match everything except new line' to 'match everything':

preg_replace('/(.*)<\/div>/s','abc',$body);

But this is HTML, so use an HTML parser instead.


it is the "s" flag, it enables . to capture newlines


you can also use [\s\S] instead of . combined with the DOTALL flag s for matching everyting because [\s\S] means exactly the same: match everything; \s matches all space-characters (including newline) and \S machtes everything that is not a space-character (i.e. everything else). in some cases/implementations of regular expressions, this works better than enabling DOTALL

caution: .* with the flag for DOTALL as well as [\s\S] are both "hungry" and won't stop reading the string. if you want them to stop at a certain position, (e.g. the first </div>), use the non-greedy operator ? behind your quantifier, e.g. .*?


How would it be possible to replace text between nested tags, like that:

$sExample2 = "Test [DIV]again[/DIV]
d[COLOR=rgb(184, 49, 47)][SIZE=26px][B][U]o[/U][/B][/SIZE][/COLOR]ssed

This is not [DIV]true[/DIV] !

Yes it is [DIV]true [DIV]but[/DIV] just [/DIV] in that case!.

Why not [DIV]now

?[/DIV] Right here.

Because it is [DIV]down
[DIV]to the [/DIV][/DIV] botton.

I know but i want to [DIV]fly
[DIV]far[/DIV]
[/DIV] away.

";

I want to replace each DIV tuples with * Help *, so that the result looks like

Test ** Test **
d[COLOR=rgb(184, 49, 47)][SIZE=26px][B][U]o[/U][/B][/SIZE][/COLOR]ssed

This is not ** Test ** !

Yes it is ** Test ** in that case!.

Why not ** Test **
 Right here.

Because it is ** Test ** 
botton.

I know but i want to ** Test **
 away.

I tried different replacements, but never received such a result.

print_r(preg_replace(
            '#\[' . preg_quote('DIV', '#') . '](.*?)\[\/' . preg_quote('DIV', '#') . '\]#si',
            '*** Test ***',
            $sExample2
        ));

This one was nearly the best but not what I need.


It is possible to use regex to strip out chunks of html data, but you need to wrap the html with custom html tags which get ignored by browsers. For example:

<?php
$html='
<div>This will be shown</div>
<custom650 rel="nofollow">
  <p class="subformedit">
    <a href="#" class="mylink">Link</a>
    <div class="morestuff">
      ... more html in here ...
    </div>
  </p>
</custom650>
<div>This will also be shown</div>
';

To strip the tags with the rel="nofollow" attributes, you can use the following regex:

$newhtml = preg_replace('/<([^\s]+)[^>]*rel="nofollow"[^>]*>.*?<\/\1>/si', '', $html);

From experience, start the custom tags on a new line. Undoubtedly a hack, but might help someone.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号