I'm trying to match and replace anchor tags using a regex. What i have so far is this:
"(<a href=['\"]?([\\w_\\.]*)['\"]?)"
The problem with this approach is that it fails to capture hrefs that also have # in their value. I've tried
"(<a开发者_JAVA百科 href=['\"]?([\\w_\\.#]*)['\"]?)"
and
"(<a href=['\"]?([\\w_\\.\\#]*)['\"]?)"
with no success.
What am i doing wrong?
Thank you
I don't think the problem is with #
(works fine for me) but with missing other url characters, such as -
, /
, :
etc.
How about a regex like this:
<a href=("[^"]+"|'[^']+'|[^ >]+)
Note: If possible, use other parsing DOM methods for valid html.
If you just want to replace the anchor part use string operations. They are simpler and faster
var parts = "http://someurl.com#hashpart".Split("#");
// yields "http://someurl.com" and "hashpart" as array.
// you may want to check if the result has length of two
// if it does :
var newUrl = string.Format("{0}#{1}" parts[0], "some replacement for hashpart");
If your URL contains multiple hashes try using string.Substring to split at the first hashtag.
var url = "http://someurl.com#hash#hashhash";
var hashPos = url.IndexOf("#");
var urlPart = url.Substring(hashPos);
var hashPart = url.Substring(hashPos +1, url.length - hashPos -1);
Should work, wrote it without verification, maybe you have to toss around some +/- 1 to get the right positions.
<a href=(('|")[^\2]+?\2|[^>]+)
精彩评论