I just discovered that setting the baseUri is necessary for each Element you get by doing a select. It would be a lot better if the baseUri of the Document is applied to each Element.
Document d = Jsoup.parse(myString);
doc.setBaseUri("http://www.goog开发者_开发问答le.de");
If I execute
Element e = d.select(....).get(0);
The baseUri of e
is empty.
Is this a bug or is it intended?
The base URI is specific to each element, as there are cases in HTML where the base URI can change throughout the parse. Currently, setting it on the document after the parse does not bubble it down to child nodes.
Just specify it when you parse the HTML string, e.g.:
Document doc = Jsoup.parse(myString, "http://www.google.de");
If you fetch the HTML from a URL and parse that (with Jsoup.connect), the base URI is automatically set.
精彩评论