开发者

Regular expression: excluding some strings but including others

开发者 https://www.devze.com 2023-03-04 16:38 出处:网络
I have this link: http://anthropology.school.com/stuff/anthropology.999.ug.courses What\'s the regular expression to exclude every link that contains /stuff/ but still include the ones that contain

I have this link:

http://anthropology.school.com/stuff/anthropology.999.ug.courses

What's the regular expression to exclude every link that contains /stuff/ but still include the ones that contain 开发者_运维百科999.ug.courses (even though /stuff/ is included)

So for example, the link above would be okay because it contains both 999.ug.courses and /stuff/

I just don't want the ones that ONLY contain /stuff/ in the link.

Also, I'm writing this in a simple configuration text file in an open source I'm using.

samples:

^http://([a-zA-Z0-9]*\.)*school.com/

^(file|ftp|mailto):

\.gif|GIF|jpg|com|JPG|js|png|php|PNG|pp|ico|atom|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|py|exe|pdf|jpeg|JPEG|bmp|BMP)$

[?*!@=]

(/about|/giving|/admissions|/Admissions|/studyabroad|/summer|/spring.in.ny|/winter|/academics|/life|/research|/global|/footer|/content|/AZ|/registrar|/its|/shc|/999|/explore.school|/prehealth|/eve|/people|/events|/IAA|sca|/aboutus|/subfields|/specialprograms|/newsevents|/resources|/employment)

Thanks.


If those are the only things you need to match for, this regex should do (in Perl format):

/http:\/\/anthropology.school.com(\/\w+\/(?<!\/stuff\/)\w*)|(\/stuff\/anthropology\.999\.ug\.courses)/

It first matches the beginning of the URL, then either a directory named anything but stuff or /stuff/anthropology.999.ug.courses.


Does it have to be a single regexp? Can you do !/\/stuff\// || /999.ug.courses/ ?


How about:

preg_match('#^.+?/stuff/(?!.*999\.ug\.courses).*$#', $url));

Your desired URLs are the ones which don't match the regex.


You are looking for conditional sub-pattern evaluation here. Following regex should work for you:

~.*?(999\.ug\.courses)(?(1).*?|(?<!/stuff/))~

Using php code:

preg_match('~.*?(999\.ug\.courses)(?(1).*?|(?<!/stuff/))~', $str, $m );
var_dump($m);

When I ran above code with:

$str ="http://anthropology.school.com/stuff/anthropology.999.ug.courses";

I got:

array(2) {
  [0]=>
  string(64) "http://anthropology.school.com/stuff/anthropology.999.ug.courses"
  [1]=>
  string(14) "999.ug.courses"
}

But when I ran above code with (no anthropology.999.ug.courses in text):

$str ="http://anthropology.school.com/stuff/anthropology.888.ug.courses";

I got:

array(0) {
}

Here is the live demo of above code.

0

精彩评论

暂无评论...
验证码 换一张
取 消