I'm using Yahoo Pipes to analyze an RSS feed. In each article, I want to parse HTML code with regex to see if the value on the line after the string "Total Songs" is bigger than 7. In all the articles, the code is layed out as in the example below (with lines ending at the same locations).
Here is an example of what I want to do. In the following code, the value to extract should be 10:
<table BORDER="0" WIDTH="100%"><tr><td><table border="0" width="100%" cellspacing="0" cellpadding="0">
<tr valign="top" align="left">
<td ALIGN="CENTER" WIDTH="166" VALIGN="TOP"><a rel="nofollow" target=开发者_如何学C"_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988"><img border="0" src="http://a2.mzstatic.com/us/r1000/091/Music/73/0e/f0/mzi.gxsvtfmh.100x100-75.jpg"/></a></td>
<td width="10"><img alt="" width="10" height="1" src="http://r.mzstatic.com/images/spacer.gif"/></td>
<td width="95%"><b><a rel="nofollow" target="_blank" href="http://itunes.apple.com/preorder/bn2-1tw/id449071164?uo=1&v0=9988">Bn2 1Tw</a></b><br>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/artist/kobana/id424122973?uo=1&v0=9988">Kobana & Yane3dots</a><br><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Expected Release Date:</b>
August 17, 2011<br>
</font><font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Total Songs:</b>
10</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Genre:</b>
<a rel="nofollow" target="_blank" href="http://itunes.apple.com/genre/music-electronic/id7?uo=1&v0=9988">Electronic</a></font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Album Price:</b>
$1.99</font><br>
<font size="3" FACE="Helvetica,Arial,Geneva,Swiss,SunSans-Regular"><b>Copyright</b>
Proton LLC</font></td>
</tr>
</table></td></tr>
</table>
With version 1 of the Yahoo Pipes engine, I used
(?<=Total.Songs\:.....)((8|9)|([1-9][0-9]+))
Which used to work, but back then, the HTML formatting I got was a little different (line breaks were inserted at different places than now by the Pipes engine). Now that I moved to V2 engine (which is a necessity since they are phasing out V1 on August 1st), it does not extract anything.
I think it has to do with the line break between the </b>
and the 10, but even though I tried multiple combinations, I could not find one that works.
Can anybody help me?
Thanks
Try this regex:
Total Songs:\D*((?!0*[0-7](?!\d))\d+)(?!\d)
The number will be stored in the first capturing group.
精彩评论