开发者

Regex exclusion behavior

开发者 https://www.devze.com 2023-01-09 00:56 出处:网络
Ok, so I know this question has been asked in different forms several times, but I am having trouble with specific syntax. I have a large string which contains html snippets. I need to find every link

Ok, so I know this question has been asked in different forms several times, but I am having trouble with specific syntax. I have a large string which contains html snippets. I need to find every link tag that does not already have a target= attribute (so that I can add one as needed).

^((?!target).)* will give me text leading up to 'target', and <a.+?>[\w\W]+?</a> will give me a link, but thats where I'm stuck. An example:

<a href="http://www.someSite.com>Link</a> (This shoul开发者_StackOverflowd be a match)
<a href="SomeLink.whatever target="_blank">Link</a> (this should not be a match).  

Any suggestions? Using DOM or XPATH are not really options since this snippet is not well-formed html.


You are being wilfully evil by trying to parse HTML with Regexes. Don't.

That said, you are being extra evil by trying to do everything in one regexp. There is no need for that; it makes your code regex-engine-dependent, unreadable, and quite possibly slow. Instead, simply match tags and then check your first-stage hits again with the trivial regex /target=/. Of course, that character string might occur elsewhere in an HTML tag, but see (1)... you have alrady thrown good practice out of the window, so why not at least make things un-obfuscated so everyone can see what you're doing?


If you insist on doing it with Regex a pattern such as this should help...

<a(?![^>]*target=) [^>]*>.*?</a>

It's by no means 100% perfect technically speaking a tag can contain a > in places other than then end so it won't work for all HTML tags.

NB. I work with PHP, you may have to make slight syntax adjustments for Java.


You could try a negative lookahead like this: <a(?!.*?target.*?).*?>[\w\W]+?</a>


I didn't test this and spent about a minute writing it, but for your specific example if you can do it on the client-side, try this via the DOM:

var links = document.getElementsByTagName("a");

for (linkIndex=0; linkIndex < links.length; linkIndex++) {
    var link = links[linkIndex];

    if (link.href && !link.target) {
        link.target = "someTarget"
        // or link.setAttribute("target", "someTarget");
    }
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号