开发者

RegEx: Link Twitter-Name Mentions to Twitter in HTML

开发者 https://www.devze.com 2022-12-11 07:35 出处:网络
I want to do THIS, just a little bit more complicated: Lets say, I have an HTML input: <a href=\"http://www.example.com\" title=\"Bla @test blubb\">Don\'t break!</a>

I want to do THIS, just a little bit more complicated:

Lets say, I have an HTML input:

<a href="http://www.example.com" title="Bla @test blubb">Don't break!</a>
Some Twitter Users: @codinghorror, @spolsky, @jarrod_dixon and @blam4c.
You can't reach me at blam4c@example.com.

Is there a good RegEx to replace the twitter username mentions by links to twitter, but leave @example (eMail-Adress at the bottom) AND @test (in the link title, i.e. in HTML tags)?

It probably should also try to not add links inside existing links, i.e. not break this:

<a href="http://www.example.com">Hello @someone there!</a>

My current attempt is to add ">" at the beginning of the string, then use this RegEx:

Search:  '/>([^&l开发者_JAVA技巧t;]*\s)\@([a-z0-9_]+)([\s,.!?])/i'
Replace: '>\1<a href="http://twitter.com/\2">@\2</a>\3'

Then remove the ">" I added in step 1.

But that won't match anything but the "@blam4c". I know WHY it does so, that's not the problem.

I would like to find a solution that finds and replaces all twitter user name mentions without destroying the HTML. Maybe it might even be better to code this without RegEx?


First, keep the angle brackets out of your regexps.

Use a HTML parser and xpath to select the text nodes you are interested in processing, then consider a regexp for matching only @refs in those nodes.

I'll let to other people to try and give a specific answer to the regex part.


I agree with ddaa, there's almost no sane way to attack this without stripping the html links out first.

Presumably you'd be starting out with an actual Twitter message, which cannot by definition include any manually entered hyperlinks.

For example, here's how I found this question (the link resolves to this question so don't bother clicking it!)

Some Twitter Users: @codinghorror, @spolsky, @jarrod_dixon and @blam4c. http://bit.ly/2phvZ1

In this case, it's easy:

var msg = "Some Twitter Users: @codinghorror, @spolsky, @jarrod_dixon and @blam4c. http://bit.ly/2phvZ1";

var html = Regex.Replace(msg, "(?<!\w)(@(\w+))", 
    "<a href=\"http://twitter.com/$2\">$1</a>");

(this might need some tweaking, I'd like to test it against a corpus, but it seems correct for the average Twitter message)

As for your more complicated cases (with HTML markup embedded in the tweets), I have no idea. Way too hard for me.


This regexp might work a bit better: /\B\@([\w\-]+)/gim

Here's a jsFiddle example of it in action: http://jsfiddle.net/2TQsx/4/

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号