开发者

Regular Expression Longest Possible Matching

开发者 https://www.devze.com 2022-12-25 18:05 出处:网络
I have an input string which 开发者_C百科is a directory address: Example: ProgramFiles/Micro/Telephone

I have an input string which 开发者_C百科is a directory address:

Example: ProgramFiles/Micro/Telephone

And I want to match it against a list of words very strictly:

Example: Tel|Tele|Telephone

I want to match against Telephone and not Tel. Right now my regex looks like this:

my( $output ) = ( $input =~ m/($list)/o );

The regex above will match against Tel. What can I do to fix it?


If you want a whole word match:

\b(Tel|Tele|Telephone)\b

\b is a zero-width word boundary. Word boundary in this case means the transition from or to a word character. A word character (\w) is [0-9a-zA-Z_].

If you simply want to match against the longest in a partial word match put the longest first. For example:

\b(Telephone|Tele|Tel)

or

(Telephone|Tele|Tel)


Change the orders: Tel|Tele|Telephone to Telephone|Tele|Tel. By the regexp algorithm, alternation is searched from left-to-right, if there found a match, that's it, no greedy matching. For example: /a|ab|abc/ working on "abc" matches "a" instead of the most greedy "abc".

or use the matching expressions.

Tel(?:e(?:phone)?)?


How about trying to find a match, as long as the longest match is not anywhere in the input? Something like:

Find telephone, OR find tel, and tele where telephone is not anywhere in the input. So, to start making it look like a regex:

(telephone) OR characters without telephone, followed by (tel|tele) followed by characters without telephone

(telephone|.*(telephone){0}.*(tel|tele).*(telephone){0}.*)

Does that make any sense?

0

精彩评论

暂无评论...
验证码 换一张
取 消