开发者

What is this following regular expression saying?

开发者 https://www.devze.com 2023-03-28 09:43 出处:网络
Just came across this following regular expression: Regex.Match(feed.Element(\"description\").Value, @\"^.{1,180}\\b(?<!\\s)\").Value

Just came across this following regular expression:

Regex.Match(feed.Element("description").Value, @"^.{1,180}\b(?<!\s)").Value
开发者_StackOverflow

I know it says that starting with anything it should contain minimum 1 and maximum 180 characters \b stands for word boundaries. I didn't understand what is \b doing here. And then (?<!\s). What is that expression doing? ?<! stands for look behind and don't consume string. My guess is that it says look behind and it shouldn't end with space. I am not sure though. Can anybody clear these doubts.


See your expression here on Regexr, this is a useful tool to test regular expressions.

I reduced the maximum length to 10 for test. So it looks like

^.{1,10}\b(?<!\s)

(?<!\s) is a negative look behind zero length assertion. That means it checks if the position before (on the left) is not a whitespace.

So, ^.{1,10}\b(?<!\s) will match at the last word boundary in the first 10 characters of the string, but only if the left part or the word boundary is not a whitespace. This will not only match on "left word boundaries" (I think tripleee means the right side of the word), because word boundaries does not necessarily include whitespace.

A word boundary \b will match on the empty string between a word character (defined by the class \w) and a non word character \W.

That means \b(?<!\s) will match for example between "A$", "A ", "(A" or ".A". All of them have a word boundary in between and the left character is not a whitespace.


In your case (?<!\s) makes sure that trailing whitespaces are not included in the match

It's easy to illustrate the following way. Change 180 to 10 in your example, so you do not need a really long test string:

^.{1,10}\b(?<!\s)

Now try to match the following string against it (note two spaces between two and three):

one two  three four

Your regular expression match won't include the two whitespaces between two and three. However if you remove the last part of your regular expression like this:

^.{1,10}\b

Then the two spaces after two will be included in the match.


Basically, the "not whitespace" assertion forces the \b to match left word boundaries only. So in other words, if the first 180 characters contain a word start anywhere after the first column, this matches. (The expression requires at least one arbitrary character before the match -- hard to say without context whether this is really correct, and what exactly it is supposed to accomplish.)

0

精彩评论

暂无评论...
验证码 换一张
取 消