开发者

Get Regexp::Common URI working

开发者 https://www.devze.com 2023-04-03 09:14 出处:网络
im using pQuery to get all TD cells from a table checking if it contains a valid URL. pQuery is working fine giving me the content of all TD cells.

im using pQuery to get all TD cells from a table checking if it contains a valid URL. pQuery is working fine giving me the content of all TD cells. But my Regexp::Common check which i have from stackoverflow doesnt work.

Heres my code:

use Regexp::Common qw/URI/;
use pQuery;

pQuery( $url)
->find( "table")
->find( "tr")
->find( "td")
->each( sub {
  my $domain = pQuery( $_)->text;
  if( $domain =~ /$RE{URI}{HTTP}/) {
    print "OK\n";
  }
});

The variable $domain contains the content of a TD cell, some of them have domains in it. They all look like "hello-world.com" or "www.test.net". The text "OK" doesnt get printed. Whats wrong here? Is it because the domains are in the format above? No HTTP, no WWW. I want a simple check if the text is a valid U开发者_StackOverflow社区RL.


Your content is not a HTTP URI, so using $RE{URI}{HTTP} will not match, it looks like your trying to match domain names, for that you want to use use Regexp::Common qw/net/; and $RE{net}{domain}{-nospace}.


Looks like the RE requires the http:// on the front. You can fudge it in your test:

  if( "http://$domain" =~ /$RE{URI}{HTTP}/) {


"hell-world.com" and the rest don't start with "http://", which is what the regex expects
check the documentation for Regexp::Common (and it's related modules), or have a look at the regex itself:

print $RE{URI}{HTTP};

Paul

0

精彩评论

暂无评论...
验证码 换一张
取 消