开发者

DOM problem when trying to extract HREF

开发者 https://www.devze.com 2023-02-06 02:30 出处:网络
I used DOM in order to extract all HREF-s from given html source. But, there\'s a problem: If i have link like this one:

I used DOM in order to extract all HREF-s from given html source. But, there's a problem: If i have link like this one:

<LINK rel="alternate" TYPE="application/rss+xml" TITLE="ES: Glavni RSS feed" HREF="/rss.xml">

then "href" element will be presented as /rss.xml, although that "/rss.xml" is just anchor text. Clicking on that link from Chrome's page source view, real link is opened开发者_开发问答.

I would like to take that href-s LINK, not anchor text. Please, how can i do it with dom?


Get a hold of the link element and get its href property. Suppose you were using an id,

<link id="myLink" rel="alternate" href="/rss.xml" />

var link = document.getElementById("myLink");
link.href; // http://www.example.com/rss.xml


"href" element will be presented as /rss.xml

Yes, that is the value of the attribute

although that "/rss.xml" is just anchor text.

No. <link> elements don't have anchor text. In the following example 'bar' is anchor text.

<a href="/rss.xml">bar</a>

Clicking on that link from Chrome's page source view, real link is opened.

Browsers know how to resolve relative URIs.

I would like to take that href-s LINK, not anchor text. Please, how can i do it with dom?

You can't use DOM to resolve a URI. You use DOM to get the value of the attribute and then use something else to resolve it as a relative URI.

The article Using and interpreting relative URLs explains how they work, and there are tools that can help resolve them.

You need to know the base URI that the relative URI is relative to (normally the URI of the document containing the link, but things like the base element can throw that off)

In Perl you might:

#!/usr/bin/perl

use strict;
use warnings;
use URI;

my $str = '/rss.xml';
my $base_uri = 'http://example.com/page/with/link/to/rss.xml';
print URI->new_abs( $str, $base_uri );

Which gives:

http://example.com/rss.xml


You can try using document.location.href to get the current URL and append the result you are getting from your example. That should give you an absolute path for the link.

0

精彩评论

暂无评论...
验证码 换一张
取 消