开发者

Break up a string by words an punctionation

开发者 https://www.devze.com 2023-02-23 09:16 出处:网络
To split up a string,开发者_如何转开发 I come up with... <php preg_match_all(\'/(\\w)|(,.!?;)/\', \"I\'m a little teapot, short and stout.\", $matches);

To split up a string,开发者_如何转开发 I come up with...

<php
    preg_match_all('/(\w)|(,.!?;)/', "I'm a little teapot, short and stout.", $matches);
    print_r($matches[0]);

I thought this would separate each word (\w) and the specified punctuation (,.!?;). For example: ["I'm", "a", "little", "teapot", ",", "short", "and", "stout", "."]

Instead I get:

Array
(
    [0] => I
    [1] => m
    [2] => a
    [3] => l
    [4] => i
    [5] => t
    [6] => t
    [7] => l
    [8] => e
    [9] => t
    [10] => e
    [11] => a
    [12] => p
    [13] => o

etc...

What am I doing wrong here?

Thanks in advance.


You have two faults:

  1. The \w matches only a single character. You want to match multiple by \w+. Furthermore \w matches only alphanumeric characters. If you want to match other characters like ' you will need to include them: [\w'].
  2. The (,.!?;) matches the character sequence ,.!?;. Instead you want to match any of these characters using [,.!?;].

The correct regex is:

'/[\w\']+|[,.!?;]/'

If you want to be more permissive you should use unicode character classes instead (allows letters, numbers, combining marks, dash characters and the apostrophe for words and punctuation for punctuation):

'/[\pL\pN\pM\pPd\']+|\pP/u'


Try this - sure it works as you want:

([\w]+)|[,.!?;]+

Also want to share with you one very useful service - online regex tester


You may want to try something like:

/([^,.!?; ]+)|(,.!?;)/
0

精彩评论

暂无评论...
验证码 换一张
取 消