开发者

PHP Regular express to remove <h1> tags (and their content)

开发者 https://www.devze.com 2023-01-01 06:16 出处:网络
Hay, i can\'t seem to find any regular expressions online to remove <h1></h1> tags (and their content).

Hay, i can't seem to find any regular expressions online to remove

<h1></h1>

tags (and their content).

Anyone lend a hand on t开发者_如何转开发his and help.


Don't use a regex, use a tool like PHP Simple HTML DOM.

// Construct dom from string
$dom = str_get_html($html);

// ...or construct dom from file/url
$dom = file_get_html($path);

// strip h1 tags (and their content)
foreach ($dom->find('h1') as $node) {
    $node->outertext = '';
}


preg_replace('@<h1[^>]*?>.*?<\/h1>@si', '', $htmlsource);


You can also use PHP's DOM extension module:

$domDocument = new DOMDocument;
$domDocument->loadHTMLFile('http://example.com');
$domNodeList = $domDocument->getElementsByTagname('h1');
$domElemsToRemove = array();
foreach ($domNodeList as $domElement) {
    $domElemsToRemove[] = $domElement;
}
foreach($domElemsToRemove as $domElement) {
    $domElement->parentNode->removeChild($domElement);
}
var_dump($domDocument->saveHTML());


You cannot find one, because there is none.

Regular expressions are not a good fit for this task, since the <h1> tags may be nested arbitrarily deep. (Edit: Tomalak pointed out that they are not allowed to, but reality is evil). Try a HTML parser instead.

Turbod's expression will work, if you can be sure that nowhere in your document can be a construct like <h1>Foo <h1> Bar</h1></h1>.

Edit: Depending on your scenario, a css style like h1 { display: none !important; } might do the trick.


Why not use strip_tags?


if you want to use regexp, this works for me:

$str = preg_replace("/<h1>.*?<\/h1>/si", '', $str);

The question mark switches content between tags to be non-greedy. That is necessary for case when you have multiple h1 tags so it will always only take the content between each of them instead of removing everything between first opening [h1] and last closing [/h1]

The 'i' modifier says to ignore uppercase/lowercase difference, and 's' says to work multiline.

0

精彩评论

暂无评论...
验证码 换一张
取 消