I am trying to find links in user entered text and convert them to link automatically.
I am using current Regex as following, which good to find hyperlin开发者_如何学Pythonks from text.
Regex regexResolveUrl = new Regex("((http://|www\\.)([A-Z0-9.-:]{1,})\\.[0-9A-Z?;~&#=\\-_\\./]{2,})", RegexOptions.Compiled | RegexOptions.IgnoreCase);
It is working good for almost all links so far i came across but it is giving problem when i want to detect links with hypen.
i.e. www.abc-xyz.com will not work, with above regex, can anyone help me with this?
If you want -
to mean dash literally in a character class definition, you need to put it as the last (or first) character. So [abc-]
is a character class containing 4 characters, a
, b
, c
, -
. On the other hand, [ab-c]
only contains 3 characters, not including the -
, because -
is a range definition.
So, something like this (from your pattern):
[A-Z0-9.-:]
Defines 3 ranges, from A
to Z
, from 0
to 9
, and from .
(ASCII 46) to :
(ASCII 58). You want instead:
[A-Z0-9.:-]
References
- regular-expressions.info/Character Class
Note on repetition
I noticed that you used {1,}
in your pattern to denote "one-or-more of".
.NET regex (like most other flavors) support these shorthands:
?
: "zero-or-one"{0,1}
*
: "zero-or-more"{0,}
+
: "one-or-more"{1,}
They may take some getting used to, but they're also pretty standard.
References
- regular-expressions.info/Repetition with Star and Plus
Related questions
- Using explicitly numbered repetition instead of question mark, star and plus
Note on C# @
-quoted string literals
While doubling the slashes in string literals for regex pattern is the norm in e.g. Java (out of necessity), in C# you actually have an option to use @
-quoted string literals.
That is, these pairs of strings are identical:
"(http://|www\\.)"
@"(http://|www\.)"
"c:\\Docs\\Source\\a.txt"
@"c:\Docs\Source\a.txt"
Using @
can lead to more readable regex patterns because a literal slash don't have to be doubled (although on the other hand, a double quote must now in turn be doubled).
References
- MSDN / C# Programmer's Reference /
string
Escape the hyphen:
Regex("((http://|www\\.)([A-Z0-9.\-:]{1,})\\.[0-9A-Z?;~&#=\\-_\\./]{2,})", RegexOptions.Compiled | RegexOptions.IgnoreCase);
Add the hyphen as the first or last character in the character class.
精彩评论