This is JavaScript regex.
regex = /(http:\/\/[^\s]*)/g;
text = "I have http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd and I like http://google.com a lot";
matches = text.match(regex);
console.lo开发者_StackOverflow社区g(matches);
I get both the urls in the result. However I want to eliminate all the urls ending with .dtd . How do I do that?
Note that I am saying ending with .dtd should be removed. It means a url like http://a.dtd.google.com should pass .
The nicest way to do it is to use a negative lookbehind (in languages that support them):
/(?>http:\/\/[^\s]*)(?<!\.dtd)/g
The ?>
in the first bracket makes it an atomic grouping which stops the regex engine backtracking - so it'll match the full URL as it does now, and if/when the next part fails it won't try going back and matching less.
The (<!\.dtd)
is a negative lookbehind, which only matches if \.dtd
doesn't match ending at that position (i.e., the URL doesn't end in .dtd
).
For languages that don't (such as JavaScript), you can do a negative lookahead instead, which is a bit more ugly and is generally less efficient:
/(http:\/\/(?![^\s]*\.dtd\b)[^\s]*)/g
Will match http://
, then scan ahead to make sure it doesn't end in .dtd
, then backtrack and scan forward again to get the actual match.
As always, http://www.regular-expressions.info/ is a good reference for more information
精彩评论