im using pQuery to get all TD cells from a table checking if it contains a valid URL. pQuery is working fine giving me the content of all TD cells. But my Regexp::Common check which i have from stackoverflow doesnt work.
Heres my code:
use Regexp::Common qw/URI/;
use pQuery;
pQuery( $url)
->find( "table")
->find( "tr")
->find( "td")
->each( sub {
my $domain = pQuery( $_)->text;
if( $domain =~ /$RE{URI}{HTTP}/) {
print "OK\n";
}
});
The variable $domain contains the content of a TD cell, some of them have domains in it. They all look like "hello-world.com" or "www.test.net". The text "OK" doesnt get printed. Whats wrong here? Is it because the domains are in the format above? No HTTP, no WWW. I want a simple check if the text is a valid U开发者_StackOverflow社区RL.
Your content is not a HTTP URI, so using $RE{URI}{HTTP}
will not match, it looks like your trying to match domain names, for that you want to use use Regexp::Common qw/net/;
and $RE{net}{domain}{-nospace}
.
Looks like the RE requires the http://
on the front. You can fudge it in your test:
if( "http://$domain" =~ /$RE{URI}{HTTP}/) {
"hell-world.com" and the rest don't start with "http://", which is what the regex expects
check the documentation for Regexp::Common (and it's related modules), or have a look at the regex itself:
print $RE{URI}{HTTP};
Paul
精彩评论