I want to find all HTML tags from the input strings and removed/replace with some text. suppose that I have string
INPUT=><img align="right" src="http://www.groupon.com/images/site_images/0623/2541/Ten-Restaurant-Group_IL-Giardino-Ristorante2.jpg" /><p>Although Italians originally invented pasta as a fastener to keep Sicily from floating away, <a href="http://www.tenrestaurantgroup.com/">Il Giardino Ristorante</a> in Newport Beach.</p>
OUTPUT=>
string strSrc="http://www.groupon.com/images/site_images/0623/2541/Ten-Restaurant-Group_IL-Giardino-Ristorante2.jpg";
<p>Although Italians originally invented pasta as a fastener to keep Sicily from floating away, http://www.tenrestaurantgroup.com in Newport Beach.</p>
From above string
if<IMG>
tag found then I want to get SRC of the tag,
if <A>
tag found then I want get HREF from the tag.
and all other tag as same it is..
How can I achieved using开发者_Python百科 Regex in C#.net?
You really, really shouldn't use regex for this. In fact, parsing HTML cannot be done perfectly with regex. Have you considered using an XML parser or HTML DOM library?
You can use HtmlAgilityPack for parsing (valid/non valid) html and get what you want.
I agree with Justin, Regex really isn't the best way to do this, and the HTML Agility is well worth a look if this is something you will need to be doing alot of.
With that said, the expression below will store attributes into a group from where you should be able to pull them into your text while ignoring the rest of the element. :
</?([^ >]+)( [^=]+?="(.+?)")*>
Hope this helps.
精彩评论