开发者

perl regex help -- hopefully an easy question

开发者 https://www.devze.com 2023-03-15 04:21 出处:网络
ashamed as I am to admit it, I\'m terrible with regex... so here I am to ask your help :) i have an html file that looks sorta like this:

ashamed as I am to admit it, I'm terrible with regex... so here I am to ask your help :)

i have an html file that looks sorta like this:

<table>
  <tr>
    <td sadf="a">
      <a href="">asdf</a>
开发者_JS百科    </td>
  </tr>
</table>

what I'd like to do, with Perl regex, is remove everything except for everything in the td tag. so i would want output to be this:

<td sadf="a">
  <a href="">asdf</a>
</td>

please help me out. Thanks


A html parser would be much better at this task, but if you insist on using a regular expression, try this:

<td[\s\S]*?</td>

It matches as few of any character as possible up until the end tag </td>.


Try using XML::Simple. As others have pointed out, you can't use regex for parsing XML.

XML::Simple will turn your HTML into a hash structure. From there, you can easily locate the "td" element, and copy the whole thing to another hash reference. Then, you can use XML::Simple to turn it back into HTML.

XML::Simple can't guarantee the same structure in XML (although it'll be pro-grammatically the same). However, I rarely have problems with turning HTML into a hashref and back into HTML.


A simpler way of thinking of this is that you want to grab the tag part with a regular expression (rather than remove everything except the tag part).

In this case, the regular expression is simple, and would probably look something like this for the first line, for example: <td \w+?="\w*"> (you can match \n to grab a multiline block). It's hard to answer without knowing exactly what is changing in your regex, but if you follow a reference like this one you should be fine.

In addition, it probably is best to do this without regex at all (using an HTML parser at all) if it's anything more than a limited, specific grab. I'll assume you know that you want to use regex, but there are really much better ways of doing this if you've got something more complicated than a very basic search pattern on your hands.

0

精彩评论

暂无评论...
验证码 换一张
取 消