开发者

How can i count the number of words between two words?

开发者 https://www.devze.com 2023-01-08 09:10 出处:网络
How can i count the number of words between two words? $txt = \"tükörfúrógép banana orange lime, tükörfúrógép cherry árvíztűrő orange lyon

How can i count the number of words between two words?

   $txt = "tükörfúrógép banana orange lime, tükörfúrógép cherry árvíztűrő orange lyon
    cat lime mac tükörfúrógép cat orange lime cat árvíztűrő
    tükörfúrógép banana orange lime
    orange lime cat árvíztűrő";

The two words: 'árvíztűrő' and 'tükörfúrógép'

I need this return:

tükörfúrógép cherry árvíztűrő

tükörfúrógép cat orange lime cat árvíztűrő

tükörfúrógép banana orange lime orange lime cat árvíztűrő

Now i have this regular expression:

pr开发者_运维问答eg_match_all('@((tükörfúrógép(.*)?árvíztűrő)(árvíztűrő(.*)?tükörfúrógép))@sui',$txt,$m);


I have several things to point out:

  1. You can't do it in one regex. Regex is forward-only, reversed match order requires a second regex.
  2. You use (.*)?, but you mean (.*?)
  3. To aquire correct matches, you must ensure that the left boundary of your expression cannot occur in the middle.
  4. You should denote word boundaries (\b) around your delimiter words to ensure whole-word matches. EDIT: While this is correct in theory, it does not work for Unicode input in PHP.
  5. You should switch the PHP locale to Hungarian (it is Hungarian, right?) before calling preg_match_all(), because the locale has an influence on what's considered a word boundary in PHP. EDIT: The meaning of \b does in fact not change with the selected locale.

That being said, regex #1 is:

(\btükörfúrógép\b)((?:(?!\1).)*?)\bárvíztűrő\b

and regex #2 is analoguous, just with reversed delimiter words.

Regex explanation:

(               # match group 1:
  \b            #   a word boundary
  tükörfúrógép  #   your first delimiter word
  \b            #   a word boundary
)               # end match group 1
(               # match group 2:
  (?:           #   non-capturing group:
    (?!         #     look-ahead:
      \1        #       must not be followed by delimiter word 1
    )           #     end look-ahead
    .           #     match any next char (includes \n with the "s" switch)
  )*?           #   end non-capturing group, repeat as often as necessary
)               # end match group 2 (this is the one you look for)
\b              # a word boundary
árvíztűrő       # your second delimiter word
\b              # a word boundary

UPDATE: With PHP's patheticpoor Unicode string support, you will be forced to use expressions like these as replacements for \b:

$before = '(?<=^|[^\p{L}])';
$after  = '(?=[^\p{L}]|$)';

This suggestion has been taken from another question.


To count words between two words you can easily use:

count(split(" ", "lime orange banana"));

And a function that returns an array with matches and counts will be:

function count_between_words($text, $first, $second, $case_sensitive = false)
{
    if(!preg_match_all('/('.$first.')((?:(?!\\1).)*?)'.$second.'/s' . ($case_sensitive ? "" : "i"), preg_replace("/\\s+/", " ", $text), $results, PREG_SET_ORDER))
        return array();

    $data = array();

    foreach($results as $result)
    {
        $result[2] = trim($result[2]);
        $data[] = array("match" => $result[0], "words" => $result[2], "count" => count(split(" ", $result[2])));
    }

    return $data;
}

$result = count_between_words($txt, "tükörfúrógép", "árvíztűrő");

echo "<pre>" . print_r($result, true) . "</pre>";

Result will be:

Array
(
    [0] => Array
    (
        [match] => tükörfúrógép cherry árvíztűrő
        [words] => cherry
        [count] => 1
    )

    [1] => Array
    (
        [match] => tükörfúrógép cat orange lime cat árvíztűrő
        [words] => cat orange lime cat
        [count] => 4
    )

    [2] => Array
    (
        [match] => tükörfúrógép banana orange lime orange lime cat árvíztűrő
        [words] => banana orange lime orange lime cat
        [count] => 6
    )
)


Instead of a huge, confusing regexp, why not write a few lines using various string functions?

Example:

$start = strpos($txt, 'árvíztűrő') + 9; // position of first char after 'árvíztűrő'
$end   = strpos($txt, 'tükörfúrógép', $start);
$inner = substr($txt, $start, $end - $start);
$words = preg_split("/[\s,]+/", $inner);
$num   = count($words);

Of course, this will eat up memory if you have some gigantic input string...

0

精彩评论

暂无评论...
验证码 换一张
取 消