开发者

How can I use RegEx to determine the largest chunk between delimiters?

开发者 https://www.devze.com 2023-03-04 04:40 出处:网络
RegEx to determine the longest \"part\" of a phrase, with specified delimiters? News stories almost always have this sort of structure, where theis actually the title plus a bunch of garbage. Is ther

RegEx to determine the longest "part" of a phrase, with specified delimiters?

News stories almost always have this sort of structure, where the is actually the title plus a bunch of garbage. Is there a way to RegEx out all the garbage and maintain the longest part of the title, obviously this would require using delimiters such as |, -, :, etc...

Here are some examples

eBand | Jornali开发者_运维技巧smo | Saúde | Alimentos em conserva podem causar botulismo; saiba como evitar a doença

Obama calls for wide-range immigration reform in El Paso - San Jose Mercury News

CL + Suspensa produção de mortadela com toucinho, suspeita de contaminação

BBC News - John Kerry to travel to Pakistan amid strained ties


Not with the regex itself I think. But you can split up the title on the "garbage" characters, and then sort by length of the remaining parts.

$parts = preg_split('#\s*[-|:+]+\s*#', $title);
$parts = array_combine($parts, array_map("strlen", $parts));
arsort($parts);
$longest = current(array_keys($parts));

Instead of specific delimiters, you could also split on non-word symbols \W (or [^\pL] with /u Unicode flag).


I don't think it can be done in pure regular expressions but you can use preg_split and iterate over the results:

$pieces = preg_split('/[|-:]/', $headline, PREG_SPLIT_NO_EMPTY);
$max_len = 0;
$result = '';
foreach ($pieces as $piece) {
   $len = strlen($piece); 
   if ($len > $max_len) {
        $max_len = $len;
        $result = $piece;
   }
}

Or use array_reduce

function longest($v, $w) {
    if (strlen($w) > strlen($v)) {
        return $w;
    }
    return $v;
}

$pieces = preg_split('/[|-:]/', $headline, PREG_SPLIT_NO_EMPTY);
$result = array_reduce($pieces, 'longest');
0

精彩评论

暂无评论...
验证码 换一张
取 消