开发者

How to distinguish between saved segment and alternative?

开发者 https://www.devze.com 2023-04-02 03:05 出处:网络
From the following text... Acme Inc.<SPACE>12345<SPACE or TAB>bla bla<CRLF> ... I need to extract company name + zip code + rest of the line.

From the following text...

Acme Inc.<SPACE>12345<SPACE or TAB>bla bla<CRLF>

... I need to extract company name + zip code + rest of the line.

Since either a TAB or a SPACE character can separate the second from the third tokens, I tried using the following regex:

FIND:^(.+) (\d{5})(\t| )(.+)$
REPLACE:\1\t\2\t\3

However, the contents of the alternative part is put in the \3 part, so the result开发者_如何学编程 is this:

Acme Inc.<TAB>12345<TAB><TAB or SPACE here>$

How can I tell the (Perl) regex engine that (\t| ) is an alternative instead of a token to be saved in RAM?

Thank you.


You want:

^(.+?) (\d{5})[\t ](.+)$

Since you are matching one character or the other, you can use a character class instead. Also, I made your first quantifier non-greedy (+? instead of +) to reduce the amount of backtracking the engine has to do to find the match.

In general, if you want to make capture groups not capture anything, you can add ?: to it, like:

^(.+?) (\d{5})(?:\t| )(.+)$


Use non-capturing parentheses:

^(.+) (\d{5})(?:\t| )(.+)$


One way is to use \s instead of ( |\t) which will match any whitespace char.

See Backslash-sequences for how Perl defines "whitespace".

0

精彩评论

暂无评论...
验证码 换一张
取 消