开发者

Regex replace characters in data

开发者 https://www.devze.com 2023-04-05 02:52 出处:网络
I am trying to clean some junked up data of special characters (allowing a few) but some still get through. I found a regex snippet earlier but does not remove some characters, like asterisks.

I am trying to clean some junked up data of special characters (allowing a few) but some still get through. I found a regex snippet earlier but does not remove some characters, like asterisks.

  $clean_body = $raw_text;

  $clean_title = preg_replace("/[^!&\/A-Za-z0-9_ ]/","", $clean_body);
  $clean_title = substr($clean_title, 0, 64);

  $clean_body = nl2br($clean_body);  

  if ($nid) {
    $node = node_load($nid);
    unset($node->field_category);
  } else {
    $node = new stdClass();
    $node->type = 'article';
    node_object_prepare($node); 
  }

  $split_title = str_split($clean_title);

  foreach ($split_title as $key => $character) {
    if ($key > 15) {
      if ($character == ' ' && !preg_match("/[^!&\/,.-]/", $split_title[$key - 1])) {
        $node->title = html_entity_decode(substr(strip_tags($clean_title), 0, $key - 1)) . '...';
      }
    }
  }

The first part attempts to clean out anything in the raw text that isn't normal punctuation or alpha numeric. Then, I split the title into an array and look for a space. What I want to do is create a title that is at least 15 characters long, and truncates on a space (leaving whole words intact) without stopping on a punctuation character. This is the part I am having trouble with.

Some titles still come out as ***************** or ** HOW TO MAKE $$$$$$ BLOGGIN开发者_如何学GoG **, when the first title should not even have *'s, and the section should be HOW TO MAKE..., for example.


What about "/[^!&\/\w\s]/ui" ? Works fine on my machine


Your problem (or, one of them anyhow) is this logic:

if ($key > 15) {
  if ($character == ' ' && !preg_match("/[^!&\/,.-]/", $split_title[$key - 1])) {
    $node->title = html_entity_decode(substr(strip_tags($clean_title), 0, $key - 1)) . '...';
  }
}

You're only setting $node->title if these conditions match when iterating the characters in the $split_title array.

What happens when they don't match? $node->title doesn't get set (or overwritten? You didn't give much context, so I can't tell).

Using this as a test:

$clean_body = '** HOW TO MAKE $$$$$$ BLOGGING **';

You can see that these conditions do not match, so $node->title does not get set (or overwritten).

0

精彩评论

暂无评论...
验证码 换一张
取 消