开发者

php/regex: "linkify" blog titles

开发者 https://www.devze.com 2023-01-06 21:47 出处:网络
I\'m trying to write a simp开发者_StackOverflow社区le PHP function that can take a string like Topic: Some stuff, Maybe some more, it\'s my stuff?

I'm trying to write a simp开发者_StackOverflow社区le PHP function that can take a string like

Topic: Some stuff, Maybe some more, it's my stuff?

and return

topic-some-stuff-maybe-some-more-its-my-stuff

As such:

  • lowercase
  • remove all non-alphanumeric non-space characters
  • replace all spaces (or groups of spaces) with hyphens

Can I do this with a single regex?


function Slug($string)
{
    return strtolower(trim(preg_replace('~[^0-9a-z]+~i', '-', html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', htmlentities($string, ENT_QUOTES, 'UTF-8')), ENT_QUOTES, 'UTF-8')), '-'));
}

$topic = 'Iñtërnâtiônàlizætiøn';
echo Slug($topic); // internationalizaetion

$topic = 'Topic: Some stuff, Maybe some more, it\'s my stuff?';
echo Slug($topic); // topic-some-stuff-maybe-some-more-it-s-my-stuff

$topic = 'here عربي‎ Arabi';
echo Slug($topic); // here-arabi

$topic = 'here 日本語 Japanese';
echo Slug($topic); // here-japanese


Many frameworks provide functions for this

CodeIgniter: http://bitbucket.org/ellislab/codeigniter/src/c39315f13a76/system/helpers/url_helper.php#cl-472

wordpress (has many more in the code): http://core.trac.wordpress.org/browser/trunk/wp-includes/formatting.php#L814


You can do it with one preg_replace:

preg_replace(array("/[A-Z]/e", "/\\p{P}/", "/\\s+/"),
    array('strtolower("$0")', '', '-'), $str);

Technically, you could do it with one regex, but this is simpler.

Preemptive response: yes, it unnecessarily uses regular expressions (though very simple ones), an unecessarily big number of calls to strtolower, and it doesn't consider non-english characters (he doesn't even give an encoding); I'm just satisfying the OP's requirements.


Why are regular expressions considered the universal panacea to all life's problems (just because a lowly backtrace in a preg_match has discovered the cure for cancer). here's a solution without recourse to regexp:

$str = "Topic: Some stuff, Maybe some more, it's my stuff?";
$str = implode('-',str_word_count(strtolower($str),2));
echo $str;

Without going the whole UTF-8 route:

$str = "Topic: Some stuff, Maybe some more, it's my Iñtërnâtiônàlizætiøn stuff?";
$str = implode('-',str_word_count(strtolower(str_replace("'","",$str)),2,'Þßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ'));
echo $str;

gives

topic-some-stuff-maybe-some-more-its-my-iñtërnâtiônàlizætiøn-stuff

0

精彩评论

暂无评论...
验证码 换一张
取 消