开发者

Help with a regex to parse through and grab contents of <p> tag in html

开发者 https://www.devze.com 2023-02-25 02:58 出处:网络
I have a site I am trying to grab data from, and the content is laid out like this: <p uri=\"/someRandomURL.p1\" class=\"\">TestData TestData TestData</p>

I have a site I am trying to grab data from, and the content is laid out like this:

 <p uri="/someRandomURL.p1" class="">TestData TestData TestData</p> 
 <p uri="/someRandomURL.p2" class="">TestData1 TestData1 TestData1</p>

I am using Java to grab the webpage's content, and am trying to parse through it like this:

        Pattern p = Pattern.compile(".*?p1' class=''>(.*?)<.*");
        Matcher m = p.matcher(data);

        //Print out regex groups to console开发者_StackOverflow中文版
        System.out.println(m.group(1)) ;

But then an exception is thrown saying there is no match found...

Is my regex right? What else could possibly be going on? I am getting the html ok, but apparently there is no match for my regex...

Thanks


If the text elements contain multiple text lines, then it wouldn't find a match, because the dot (.) doesn't match \n (by default).

Give this a try:

 Pattern p = Pattern.compile(".*?p1' class=''>(.*?)<.*", Pattern.DOTALL);
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号