开发者

Wikipedia links regex in PHP

开发者 https://www.devze.com 2023-01-25 10:00 出处:网络
How can I draw only the words in [[words]] into array? [[旭川市|旭川]](文化) - [[アイヌ]]文化、[[旭川市旭山動物園|旭山動物園]]など

How can I draw only the words in [[words]] into array?

[[旭川市|旭川]](文化) - [[アイヌ]]文化、[[旭川市旭山動物園|旭山動物園]]など

I tried \[\[.*]] but it didn't work, maybe it is because .*开发者_JAVA技巧 is only for English strings..


preg_match_all('/\[\[(.+?)\]\]/u',$str,$matches);
var_dump($matches);


You can encode the Unicode first:

[旭川市旭山動物園|旭山動物園]]な&#12393l]


You need to backslash both sides, all the square brackets need to be escaped.

This worked in Python, may need modification for PHP:


>>> re.compile('\[\[(.*?)\]\]')
<_sre.SRE_Pattern object at 0xb747ebf0>
>>> r=_
>>> r.search(text)
<_sre.SRE_Match object at 0xb7469560>
>>> r.findall(text)
['\xe6\x97\xad\xe5\xb7\x9d\xe5\xb8\x82|\xe6\x97\xad\xe5\xb7\x9d', '\xe3\x82\xa2\xe3\x82\xa4\xe3\x83\x8c', '\xe6\x97\xad\xe5\xb7\x9d\xe5\xb8\x82\xe6\x97\xad\xe5\xb1\xb1\xe5\x8b\x95\xe7\x89\xa9\xe5\x9c\x92|\xe6\x97\xad\xe5\xb1\xb1\xe5\x8b\x95\xe7\x89\xa9\xe5\x9c\x92']

Hmm, maybe I'm wrong about having to escape the right-square brackets, turned out it wasn't necessary in Python.


One problem is that you're using the greedy wildcard: \[\[.*]] will match from the first [[ to the last ]], including any intervening ]].

Most regex engines now also include a nongreedy wildcard, typically *? so \[\[.*?]] would just match one wikilink at a time.

0

精彩评论

暂无评论...
验证码 换一张
取 消