Regex explanation_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-10 18:58 出处：网络

I am looking at the code in the tumblr bookmarklet and was curious what the code below did. try{ if(!/^(.*\\.)?tumblr[^.]*$/.test(l.host))

I am looking at the code in the tumblr bookmarklet and was curious what the code below did.

try{
    if(!/^(.*\.)?tumblr[^.]*$/.test(l.host))
        throw(开发者_高级运维0);
        tstbklt();
}

Can anyone tell me what the if line is testing? I have tried to decode the regex but have been unable to do so.

Initially excluding the specifics of the regex, this code is:

if ( ! /.../.test(l.host) )

"if not regex.matches(l.host)" or "if l.host does not match this regex"

So, the regex must correctly describe the contents of l.host text for the conditional to fail and thus avoid throwing the error.

On to the regex itself:

^(.*\.)?tumblr[^.]*$

This is checking for the existence of tumblr but only after any string ending in . that might exist:

^       # start of line
(       # begin capturing group 1
.*      # match any (non-newline) character, as many times as possible, but zero allowed
\.      # match a literal .
)       # end capturing group 1
?       # make whole preceeding item optional
tumblr  # match literal text tumblr
[^.]*   # match any non . character, as many times as possible, but zero allowed
$       # match end of line

I thought it was testing to see if the host was tumblr

Yeah, it looked like it might be intended to check that, but if so it's the wrong way to do it.
For that, the first bit should be something like ^(?:[\w-]+\.)? to capture an alphanumeric subdomain (the ?: is a non-capturing group, the [\w-]+ is at least 1 alphanumeric, underscore or hyphen) and the last bit should be either \.(?:com|net|org)$ or perhaps like (?:\.[a-zA-Z]+)+$ depending on how flexible the tld section might need to be.

My attempt to break it down. I'm no expert with regex however:

if(!/^(..)?tumblr[^.]$/.test(l.host))

This part isn't really regex but tells us to only execute the if() if this test does not work.

if(!/^(.*\.)?tumblr[^.]*$/.test(l.host))

This part allows for any characters before the tumblr word as long as they are followed by a . But it is all optional (See the ? at the end)

if(!/^(.*.)?tumblr**[^.]*$/**.test(l.host))

Next, it matches any character except the . and it the *$ extends that to match any character afterwards (so it doesn't break after 1) and it works until the end of the string.

Finally, the .test() looks to test it against the current hostname or whatever l.host contains (I'm not familiar with the tumblr bookmarklet)

So basically, it looks like that part is checking to see that if the host is not part of tumblr, then throw that exception.

Looking forward to see how wrong I am :)