开发者

Remove text enclosed in a div tag using C# Regex

开发者 https://www.devze.com 2023-03-05 16:24 出处:网络
I have a string as follows: string chart = \"<div id=\\\"divOne\\\">Label.</div>;\" which is generated dynamically without my control and would like to remove the text \"Label.\" from the

I have a string as follows: string chart = "<div id=\"divOne\">Label.</div>;" which is generated dynamically without my control and would like to remove the text "Label." from the enclosing div element.

I tried the following but my regex knowledge still limited to get it working: System.Text.RegularExpressions.Regex.Replace(chart, @"/(<div[^>]+&开发者_如何学运维gt;)[^<]+(<\/div>)/i", "");


Using LinqPad I got this snippet working. Hopefully it solves your problem correctly.

string chart = "<div id=\"divOne\">Label.</div>;";

var regex = new System.Text.RegularExpressions.Regex(@">.*<");

var result = regex.Replace(chart, "><");

result.Dump(); // prints <div id="divOne"></div>

Essentially, it finds all characters between the opposing angle brackets, and replaces it.

The approach you take depends on how robust the replacement needs to be. If you're using this at a more general level where you want to target the specific node, you should use a MatchEvaluator. This example produces a similar result:

string pattern = @"<(?<element>\w*) (?<attrs>.*)>(?<contents>.*)</(?<elementClose>.*>)";

var x = System.Text.RegularExpressions
    .Regex.Replace(chart, pattern, m => m.Value.Replace(m.Groups["contents"].Value, ""));

The pattern you use in this case is customizable, but it takes advantage of named group captures. It allows you to isolate portions of the match, and refer to them by name.


Your regex looks good to me, (but don't specify the '/.../i' delimiters and modifier). And use '$1$2' as your replacement string:

var re = new System.Text.RegularExpressions.Regex(@"(?i)(<div[^>]+>)[^<]+(<\/div>)");
var text = regex.Replace(text, "$1$2");


Try this for your regex:

<div\b[^>]*>(.*?)<\/div>

The following produces the output <div></div>

System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(@"<div\b[^>]*>(.*?)<\/div>");
Console.WriteLine(regex.Replace("<div>Label 1.</div>","<div></div>"));
Console.ReadLine();


You must just write a pattern to select text in the div tag.

Regex.Replace(chart,yourPattern,string.empty);


I'm a little confused by your question; it sounds like you are parsing through some pre-generated HTML and want to remove all instances of the value of chart that occur within in a <div> tag. If that's correct, try this:

"(<div[^>]*>[^<]*)"+chart+"([^<]*</div>)"

Return the first & second groupings concatenated together and you should have your <div> back sans chart.


Here is a better way than Regex.

var element = XElement.Parse("<div id=\"divOne\">Label.</div>");
element.Value = "";
var value = element.ToString();

RegEx match open tags except XHTML self-contained tags

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号