The problem:
I need to extract strings that ar开发者_Go百科e between$
characters from a block of text, but i'm a total n00b when it comes to regular expressions.
For instance from this text:
Li Europan lingues $es membres$ del sam familie. Lor $separat existentie es un$ myth.
i would like to get an array consisting of:
{'es membres', 'separat existentie es un'}
A little snippet in Python would be great.
Import the re
module, and use findall()
:
>>> import re
>>> p = re.compile('\$(.*?)\$')
>>> s = "apple $banana$ coconut $delicious ethereal$ funkytown"
>>> p.findall(s)
['banana', 'delicious ethereal']
The pattern p
represents a dollar sign (\$
), then a non-greedy match group ((...?)
) which matches characters (.
) of which there must be zero or more (*
), followed by another dollar sign (\$
).
You can use re.findall:
>>> re.findall(r'\$(.*?)\$', s)
['es membres', 'separat existentie es un']
The regex below captures everything between the $ characters non-greedily
\$(.*?)\$
import re;
m = re.findall('\$([^$]*)\$','Li Europan lingues $es membres$ del sam familie. Lor $separat existentie es un$ myth');
Alternative without regexes which works for this simple case:
>>> s="Li Europan lingues $es membres$ del sam familie. Lor $separat existentie es un$"
>>> s.split("$")[1::2]
['es membres', 'separat existentie es un']
Just split the string on '$' (this gives you a python list) and then only use every 'second' element of this list.
Valid regex demo in Perl:
my $a = 'Li Europan lingues $es membres$ del sam familie. Lor $separat existentie es un$ myth.';
my @res;
while ($a =~ /\$([^\$]+)\$/gos)
{
push(@res, $1);
}
foreach my $item (@res)
{
print "item: $item\n";
}
flags: s - treat all input text as single line, g - global
精彩评论