Regex to pull out quoted text_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-24 11:06 出处：网络

I would like to use regex to identify quotes in a string with the words between them.I also would like to include both double quotes and single quotes.

相关专题：php regex

I would like to use regex to identify quotes in a string with the words between them. I also would like to include both double quotes and single quotes.

Example, If I had a string:

The "cat and the hat" sat on a rat.  The 'mouse ran' up the clock.

Then 开发者_JAVA技巧it would identify the following:

cat and the hat
mouse ran

What would the regex be?

(["']).*?\1

Works for me. Assuming that quotes can't exist inside quotes...

#!/usr/bin/env perl
use 5.010;

my $quoted_rx = qr{
    (?<quote> ['"] )  # SO highlight bug "'
    (?<guts> 
       (?: (?! \k<quote> ) . ) *
    )
    \k<quote>
}sx;

my $string = <<'END_OF_STRING';
The "cat and the hat" sat on a rat.  The 'mouse ran' up the clock.
END_OF_STRING

while ($string =~ /$quoted_regex/g) {
     say $+{guts};
}

Each time you match, the quote-type will be in $+{quote} and the stuff in between them will be in $+{guts}.

That only works for U+27 (APOSTROPHE) and U+22 (QUOTATION MARK). If you want it to work for things like ‘this’ and “this”, you’ll have to be fancier. There is a \p{Quotation_Mark} property for any sort of quotation mark, and \p{Pi} for initial punctuation and \p{Pf} for final punctuation.

$s = 'The "cat and the hat" sat on a rat.  The \'mouse ran\' up the clock.';
preg_match_all('~([\'"])(.*?)\1~s', $s, $result);
print_r($result[2]);

output (as seen on ideone):

Array
(
    [0] => cat and the hat
    [1] => mouse ran
)

preg_match_all saves all the match results in an array of arrays. You can change how the results are arranged, but by default the first array contains the overall matches ($0 or $&), the second array contains the contents of the first capturing group ($1, $2, etc.), and so on.

In this case $result[0] is the complete quoted strings from all of the matches, $result[1] is the quote, and $result[2] is whatever was between the quotes.