开发者

Regex for getting content between $ chars from a text

开发者 https://www.devze.com 2022-12-15 07:22 出处:网络
The problem: I need to extract strings that ar开发者_Go百科e between $ characters from a block of text, but i\'m a total n00b when it comes to regular expressions.

The problem:

I need to extract strings that ar开发者_Go百科e between $ characters from a block of text, but i'm a total n00b when it comes to regular expressions.

For instance from this text:

Li Europan lingues $es membres$ del sam familie. Lor $separat existentie es un$ myth.

i would like to get an array consisting of:

{'es membres', 'separat existentie es un'}

A little snippet in Python would be great.


Import the re module, and use findall():

>>> import re
>>> p = re.compile('\$(.*?)\$')
>>> s = "apple $banana$ coconut $delicious ethereal$ funkytown"
>>> p.findall(s)
['banana', 'delicious ethereal']

The pattern p represents a dollar sign (\$), then a non-greedy match group ((...?)) which matches characters (.) of which there must be zero or more (*), followed by another dollar sign (\$).


You can use re.findall:

>>> re.findall(r'\$(.*?)\$', s)
['es membres', 'separat existentie es un']


The regex below captures everything between the $ characters non-greedily

\$(.*?)\$


import re;
m = re.findall('\$([^$]*)\$','Li Europan lingues $es membres$ del sam familie. Lor $separat existentie es un$ myth');


Alternative without regexes which works for this simple case:

>>> s="Li Europan lingues $es membres$ del sam familie. Lor $separat existentie es un$"
>>> s.split("$")[1::2]
['es membres', 'separat existentie es un']

Just split the string on '$' (this gives you a python list) and then only use every 'second' element of this list.


Valid regex demo in Perl:

my $a = 'Li Europan lingues $es membres$ del sam familie. Lor $separat existentie es un$ myth.';
my @res;
while ($a =~ /\$([^\$]+)\$/gos)
{
 push(@res, $1);
}

foreach my $item (@res)
{
 print "item: $item\n";
}

flags: s - treat all input text as single line, g - global

0

精彩评论

暂无评论...
验证码 换一张
取 消