开发者

Trying to get just the URLs from output in Java

开发者 https://www.devze.com 2023-02-11 22:40 出处:网络
I\'m new to Java, and have been looking for a solution.. perhaps i\'m not searching on the right terminology.

I'm new to Java, and have been looking for a solution.. perhaps i'm not searching on the right terminology.

My goal: I have a Java class that uses webdr开发者_开发问答iver to go to a page, perform a search... and output the results. The output results have plain text with URLs. All I care about are the URL's returned. So basically, I want to take my output like:

Search result 1 http://www.somesite.com/blahblah this is a site from the search results.

but all I want is the URL, i want to dump the rest of the output. I've looked into 'parsing in java' but not finding what i'm looking for. Any pointers would be much appreciated.


Pattern pattern = Pattern.compile("http://[^\\s]*");
Matcher matcher = pattern
    .matcher("Search result 1 http://www.somesite.com/blahbl+ah1 this is a site from the search results.\nSearch result 1 http://www.somesite.com/blahblah2 this is a site from the search results.");

for (int begin = 0; matcher.find(begin); begin = matcher.end())
{
    System.out.println(matcher.group(0));
}


Check out the regex package: http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/package-summary.html

There are other ways to parse of course, but going the regexp route is probably the cleanest.

0

精彩评论

暂无评论...
验证码 换一张
取 消