How to distinguish between saved segment and alternative?_问答_开发者

How to distinguish between saved segment and alternative?

开发者 https://www.devze.com 2023-04-02 03:05 出处：网络

From the following text... Acme Inc.<SPACE>12345<SPACE or TAB>bla bla<CRLF> ... I need to extract company name + zip code + rest of the line.

相关专题：regex

From the following text...

Acme Inc.<SPACE>12345<SPACE or TAB>bla bla<CRLF>

... I need to extract company name + zip code + rest of the line.

Since either a TAB or a SPACE character can separate the second from the third tokens, I tried using the following regex:

FIND:^(.+) (\d{5})(\t| )(.+)$
REPLACE:\1\t\2\t\3

However, the contents of the alternative part is put in the \3 part, so the result开发者_如何学编程 is this:

Acme Inc.<TAB>12345<TAB><TAB or SPACE here>$

How can I tell the (Perl) regex engine that (\t| ) is an alternative instead of a token to be saved in RAM?

Thank you.

You want:

^(.+?) (\d{5})[\t ](.+)$

Since you are matching one character or the other, you can use a character class instead. Also, I made your first quantifier non-greedy (+? instead of +) to reduce the amount of backtracking the engine has to do to find the match.

In general, if you want to make capture groups not capture anything, you can add ?: to it, like:

^(.+?) (\d{5})(?:\t| )(.+)$

Use non-capturing parentheses:

^(.+) (\d{5})(?:\t| )(.+)$

One way is to use \s instead of ( |\t) which will match any whitespace char.

See Backslash-sequences for how Perl defines "whitespace".

How to distinguish between saved segment and alternative?

精彩评论

关注公众号

热门标签

图文推荐

How to distinguish between saved segment and alternative?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：