开发者

Getting the website title from a link in a string

开发者 https://www.devze.com 2023-02-21 23:47 出处:网络
string: \"Here is开发者_如何学JAVA the badges, https://stackoverflow.com/badges bla bla bla\" If string contatins a link (see above) I want to parse the website title of that link.

string: "Here is开发者_如何学JAVA the badges, https://stackoverflow.com/badges bla bla bla"

If string contatins a link (see above) I want to parse the website title of that link.

It should return : Badges - Stack Overflow.

How can i do that?

Thanks.


#!/usr/bin/perl -w

require LWP::UserAgent;

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;

my $response = $ua->get('http://search.cpan.org/');

if ($response->is_success) {
    print $response->title();
}
else {
    die $response->status_line;
}

See LWP::UserAgent. Cheers :-)


I use URI::Find::Simple's list_uris method and URI::Title for this.


Depending how the link is given and how you define title, you need one or other approach.

In the exact scenario that you have presented, getting the URL with URI::Find, HTML::LinkExtractor etc, and then my $title=URI->new($link)->path() will provide the title and the link.

But if the website title is the linked text like <a href="https://stackoverflow.com/badges"> badged</a>, then How can I extract URL and link text from HTML in Perl? will give you the answer.

If the title is encoded in the link itself and the link is the text itself of the link, how do you define the title?

  1. Do you want the last bit of the URI before any query? What happens with the queries set as URL paths?
  2. Do you want the part between the host and the query?
  3. Do you want to parse the link source and retrieve the title tag if any?

As always going from trivial first implementation to cover all corner cases is a daunting tasks ;-)

0

精彩评论

暂无评论...
验证码 换一张
取 消