开发者

Remove hyperlinks from text but keep anchor text

开发者 https://www.devze.com 2023-02-22 17:06 出处:网络
I need to strip link tags from a body of text but keep the anchor text. for example: <a href =\"\">AnchorText</a>

I need to strip link tags from a body of text but keep the anchor text. for example:

<a href ="">AnchorText</a>

needs to become just:

AnchorText

I was considering using the following RegEx:

<(.{0}|/)(a|A).*?>

Is a RegEx the best way to go about thi开发者_如何学JAVAs? If so, is the above RegEx pattern adequate? If RegEx isn't the way to go, what's a better solution? This needs to be done server side.


Your regex will do the job. You can write it a bit simpler as

</?(a|A).*?>

/? means 0 or 1 /

But its equivalent to your (.{0}|/)


You could just use HtmlAgilityPack:

string sampleHtml = "<a href =\"\">AnchorText</a>";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(sampleHtml);
string text = doc.DocumentNode.InnerText; //output: AnchorText


I think a regex is the best way to accomplish this, and your pattern looks like it should work.


Use jQuery replaceWith:

$('a').replaceWith(function()
{
    return $('<span/>').text($(this).text());
});

Assuming you are doing this on the client side.


I have been trying to do the same and found the following solution:

  1. Export the text to CSV.
  2. Open the file in Excel.
  3. Run replace using <*> which will remove links and leave the anchor text.
  4. Import the result again to overwrite existing content.
0

精彩评论

暂无评论...
验证码 换一张
取 消