Hay, i can't seem to find any regular expressions online to remove
<h1></h1>
tags (and their content).
Anyone lend a hand on t开发者_如何转开发his and help.
Don't use a regex, use a tool like PHP Simple HTML DOM.
// Construct dom from string
$dom = str_get_html($html);
// ...or construct dom from file/url
$dom = file_get_html($path);
// strip h1 tags (and their content)
foreach ($dom->find('h1') as $node) {
$node->outertext = '';
}
preg_replace('@<h1[^>]*?>.*?<\/h1>@si', '', $htmlsource);
You can also use PHP's DOM extension module:
$domDocument = new DOMDocument;
$domDocument->loadHTMLFile('http://example.com');
$domNodeList = $domDocument->getElementsByTagname('h1');
$domElemsToRemove = array();
foreach ($domNodeList as $domElement) {
$domElemsToRemove[] = $domElement;
}
foreach($domElemsToRemove as $domElement) {
$domElement->parentNode->removeChild($domElement);
}
var_dump($domDocument->saveHTML());
You cannot find one, because there is none.
Regular expressions are not a good fit for this task, since the <h1>
tags may be nested arbitrarily deep. (Edit: Tomalak pointed out that they are not allowed to, but reality is evil). Try a HTML parser instead.
Turbod's expression will work, if you can be sure that nowhere in your document can be a construct like <h1>Foo <h1> Bar</h1></h1>
.
Edit:
Depending on your scenario, a css style like h1 { display: none !important; }
might do the trick.
Why not use strip_tags?
if you want to use regexp, this works for me:
$str = preg_replace("/<h1>.*?<\/h1>/si", '', $str);
The question mark switches content between tags to be non-greedy. That is necessary for case when you have multiple h1 tags so it will always only take the content between each of them instead of removing everything between first opening [h1] and last closing [/h1]
The 'i' modifier says to ignore uppercase/lowercase difference, and 's' says to work multiline.
精彩评论