开发者

How to extract absolute URL from relative HTML links using Jsoup?

开发者 https://www.devze.com 2023-01-24 14:42 出处:网络
I am using Jsoup to extract URL of an webpage. The href attribute of those URL\'s are relative like: <a href=\"/text\">example</a>

I am using Jsoup to extract URL of an webpage. The href attribute of those URL's are relative like:

<a href="/text">example</a>

Here is my attempt:

Doc开发者_运维知识库ument document = Jsoup.connect(url).get();
Elements results = document.select("div.results");
Elements dls = results.select("dl");
for (Element dl : dls) {
    String url = dl.select("a").attr("href");
}

This works fine, but if I use

String url = dl.select("a").attr("abs:href");

to get the absolute URL like http://example.com/text, it is not working. How can I get the absolute URL?


You need Element#absUrl().

String url = dl.select("a").absUrl("href");

You can by the way shorten the select:

Document document = Jsoup.connect(url).get();
Elements links = document.select("div.results dl a");
for (Element link : links) {
    String url = link.absUrl("href");
}


String url = dl.select("a").absUrl("href");

Is not correct because dl.select("a") will not return a single item but a collection. You need to get elements by index

eg :

Elements elems = dl.select("a");
Element a1 = elems.get(0); //0 is the index first element increasing to (elems.size()-1)
now you can do
a1.absUrl("href");

If you are sure only one item will result from the select above, or that the item you want will be the first, you can:

String url = dl.select("a").get(0).absUrl("href"); 

Which is also same as

String url = dl.select("a").first().absUrl("href");

It doesn't have to be the first element anyway, you can always replace the 0 in String url = dl.select("a").get(0).absUrl("href"); with the index of your element. Or use a select that is more specific that will only result in one element.

0

精彩评论

暂无评论...
验证码 换一张
取 消