开发者

Automatically hyper-link URL's and Email's using C#, whilst leaving bespoke tags in place

开发者 https://www.devze.com 2023-01-03 14:53 出处:网络
I have a site that enables users to post messages to a forum. At present, if a user types a web address or email address and posts it, it\'s treated the same as any other piece of text.

I have a site that enables users to post messages to a forum.

At present, if a user types a web address or email address and posts it, it's treated the same as any other piece of text.

There are tools that enable the user to supply hyper-linked web and email addresses (via some bespoke tags/markup) - these are sometimes used, but not always. In addition, a bespoke 'Image' tag can also be used to reference images that are hosted on the web.

My objective is to both cater for those that use these existing tools to generate hyper-linked addresses, but to also cater for those that simply type a web or email address in, and to then automatically convert this to a hyper-linked address for them (as soon as they submit their post).

I've found one or two regular expressions that convert a plain string web or email address, however, I obviously don't want to perform any开发者_如何学运维 manipulation on addresses that are already being handled via the sites bespoke tagging, and that's where I'm stuck - how to EXCLUDE any web or email addresses that are already catered for via the bespoke tagging - I wan't to leave them as is.

Here are some examples of bespoke tagging for the variations that I need to be left alone:

[URL=www.msn.com]www.msn.com[/URL]

[URL=http://www.msn.com]http://www.msn.com[/URL]

[EMAIL=bob@smith.com]bob@smith.com[/EMAIL]

[IMG]www.msn.com/images/test.jpg[/IMG]

[IMG]http://www.msn.com/images/test.jpg[/IMG]

The following examples would however ideally need to be automatically converted into web & email links respectively:

www.msn.com

http://www.msn.com

bob@smith.com

Ideally, the 'converted' links would just have the appropriate bespoke tags applied to them as per the initial examples earlier in this post, so rather than:

<a href="..." etc.

they'd become:

[URL=http://www.. etc.)

Unfortunately, we have a LOT of historic data stored with this bespoke tagging throughout, so for now, we'd like to retain that rather than implementing an entirely new way of storing our users posts.

Any help would be much appreciated.

Thanks.


Here's the method I use. I don't have access right now to the full codebase so can't see how that fits in alongside the forum-code to stop double-linking, but try it out and see if it works for you...

/// <summary>
    /// Turns any literal URL references in a block of text into ANCHOR html elements.
    /// </summary>
    public static string ActivateLinksInText(string source)
    {
        source = " " + source + " ";
        // easier to convert BR's to something more neutral for now.
        source = Regex.Replace(source, "<br>|<br />|<br/>", "\n");
        source = Regex.Replace(source, @"([\s])(www\..*?|http://.*?)([\s])", "$1<a href=\"$2\" target=\"_blank\">$2</a>$3");
        source = Regex.Replace(source, @"href=""www\.", "href=\"http://www.");
        //source = Regex.Replace(source, "\n", "<br />");
        return source.Trim();
    }


You'll want to add negative lookaround assertions to you regular expressions. .NET supports this fully.

http://www.regular-expressions.info/lookaround.html

Negative lookahead asserts that your pattern is not followed by something. The syntax is (?!xxx), where xxx is a pattern defining what you don't want. You could use (?!\[\/URL\]) for links, for example.

Negative lookbehind looks like (?<!xxx). Here you'll need a pattern -- something like (?<!\[URL=.*?\]) -- but you could make this more robust, if needed.


Jay's right, though you could also use those plain-link matching regex's you have and just add \b to the start and end so it only matches links that don't have stuff around them, i.e. your forum-code tags.

\b is word-boundary, i.e. spaces, periods, commas, etc, mean it's a stand-alone word and not part of something bigger.

I did the same thing for my forum software. I parsed the forum-code first, so it built anchor tags, and then I looked for plain links on their own using such a regex and converted those.


The regex you are looking for is (?<![EMAIL=\1])(\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b)(?!\[\/EMAIL]). At least, this is what you need for the email tag. Your replace would simply be [EMAIL=$1]$1[/EMAIL]. For the others you need to replace the center group and the EMAIL tags with whatever is appropriate.

Test Cases:

[EMAIL=bob@smith.com]bob@smith.com[/EMAIL] : FALSE
don@smith.com : TRUE

Evaluated under .NET Regex, as per your tag.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号