开发者

Parsing HTML Table using Regex

开发者 https://www.devze.com 2023-01-10 01:21 出处:网络
I am trying to extract the contents of the table using Regex. I have removed most of the tags from the table, i am stuck with <br> , <a href >, <img > & <b>How to remove t

I am trying to extract the contents of the table using Regex.

I have removed most of the tags from the table, i am stuck with <br> , <a href >, <img > & <b> How to remove them ??

for <b> tag i tried this Regex

 \s*<b[^开发者_高级运维>]*>\s* 
(?<value>.*?)
 \s* </b>\s*

it worked for some lines and some its giving the out put as

<b class="saadirheader">Email:</b>

Can anyone help me removing these tags

<br> , <a href >, <img > and  <b>

Full Tags :-

<img src="Newrecord_files/spacer.gif" alt="" border="0" height="1" width="5">

<a href="mailto:first.last@email.org">

Thanking you,

Naveen HS


Use the following Regex:

(?:<br|<a href|<img|<b)(?:.(?!>))*.>

This Regex will match all the tags you mentioned above, and if there are more tags you forgot to mention just add a "|" sign with the tag you want to add, and insert it into the first parentheses.

0

精彩评论

暂无评论...
验证码 换一张
取 消