开发者

PHP: Removing only the first few empty <p> tags

开发者 https://www.devze.com 2023-01-29 08:58 出处:网络
I have a custom developed CMS where users can enter some content into a rich text field (ckeditor). Users simply copy-paste data from another document. Sometimes the data has empty <p> tags at

I have a custom developed CMS where users can enter some content into a rich text field (ckeditor).

Users simply copy-paste data from another document. Sometimes the data has empty <p> tags at the beginning. Here's a sample of the data:

<p></p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>
<p></p>
<p>Data data data data</p>
<p>Data data data data</p>
<p></p>

I don't want to remove all the empty <p> tags, only the ones before the actual data, the top 3 <p> tags in this case.

How can I do that?

Edit: To clarify, I need a PHP solution. Javascript won't do.

Is there a way I can gather all <开发者_JS百科p> tags in an array, then iterate and delete until I encounter one with data?


Please, don't use regular expressions for irregular strings: it stirs the sleeping god. Instead, use XPath:

function strip_opening_lines($html) {  
  $dom = new DOMDocument();
  $dom->preserveWhitespace = FALSE;
  $dom->loadHTML($html);

  $xpath = new DOMXPath($dom);
  $nodes = $xpath->query("//p");

  foreach ($nodes as $node) {
    // Remove non-significant whitespace.
    $trimmed_value = trim($node->nodeValue);

    // Check to see if the node is empty (i.e. <p></p>). 
    // If so, remove it from the stack.
    if (empty($trimmed_value)) {
      $node->parentNode->removeChild($node);
    }
    // If we found a non-empty node, we're done. Break out.
    else {
      break;
    }
  }
  $parsed_html = $dom->saveHTML();

  // DOMDocument::saveHTML adds a DOCTYPE, <html>, and <body> 
  // tags to the parsed HTML. Since this is regular data, 
  // we can use regular expressions.
  preg_match('#<body>(.*?)<\/body>#is', $parsed_html, $matches);

  return $matches[1];
}

Reasons why all the regex solutions presented are bad:

  • Won't match empty paragraph elements with attributes (e.g. <p class="foo"></p>)
  • Won't match empty paragraph elements that are not literally empty (e.g. <p> </p>)


Normally I would advise against using a regular expression to parse HTML, but this one seems harmless:

$html = preg_replace('!^(<p></p>\s*)+!', '', $html);


Use

$html = preg_replace ("~^(<p><\/p>[\s\n]*)*~iUmx", "", $html);


You can do it in javascript, as soon as performs paste operation, strip off unwanted tags using regular expressions,

your code will be like,

document.getElementById("id of rich text field").onkeyup = stripData; 
document.getElementById("id of rich text field").onmouseup = stripData; 

function stripData(){
    document.getElementById("id of rich text field").value = document.getElementById("id of rich text field").value.replace(/\<p\>\<\/p\>/g,"");
}

Edit: To remove initial empty

only,

 function stripData(){
        var dataStr = document.getElementById("id of rich text field").value 
        while(dataStr.match(/^\<p\>\<\/p\>/g)) {
           dataStr  = dataStr .replace(/^\<p\>\<\/p\>/g,"");
        }
        document.getElementById("id of rich text field").value = dataStr;
 }
0

精彩评论

暂无评论...
验证码 换一张
取 消