开发者

Matching Rapidshare links with regex

开发者 https://www.devze.com 2022-12-15 08:25 出处:网络
I want to match a sequence of Rapidshare links on a webpage. The links look like: http://rapidshare.com/files/326251387/file_name.rar

I want to match a sequence of Rapidshare links on a webpage. The links look like:

http://rapidshare.com/files/326251387/file_name.rar

I wrote this code:

if(preg_match_all('/http:\/\/\rapidshare\.com\/files\/.*?\/.*?/', $links[1], $links))
{
    echo 'Found links.';
} else {
    die('Cannot find links :开发者_JAVA技巧(');
}

And it retuns Cannot find links :( every time. Please note that I want to return the entire match, so it will bring back every Rapidshare link found on the page in an array.

$links[1] does have a valid string, too.

Any help will be appreciated, cheers.


Looks like you have a stray backslash before rapidshare

if(preg_match_all('/http:\/\/\rapidshare\.com\/files\/.*?\/.*?/', $links[1], $links))

Should be

if(preg_match_all('/http:\/\/rapidshare\.com\/files\/.*?\/[^\s"']+/', $links[1], $links))

(\r is a carriage return character)


There are a lot of nonlogic HTTPS links to rapidshare.com, you can find them on google: "https://rapidshare.com/files/"

I recommend changing your regex to include https?:.


To avoid that madness you're getting into escaping slashes in URLs, I would use another delimiter for my regex -- like # for instance ; and this would help seeing that you have one too many \ before rapideshare.


Then, you could have something that looks like this :
(Inspired from yours -- only changed a bit at the end because it wasn't returning the file's name ;; you might want to adapt this a bit more, though, to exlclude some other characters than just white-spaces, like ")

$str = 'blah http://rapidshare.com/files/326251387/file_name.rar blah';
if(preg_match_all('#http://rapidshare\.com/files/(.*?)/([^\s]+)#', $str, $m)) {
    var_dump($m);
}


Which, here, will get you :

array
  0 => 
    array
      0 => string 'http://rapidshare.com/files/326251387/file_name.rar' (length=51)
  1 => 
    array
      0 => string '326251387' (length=9)
  2 => 
    array
      0 => string 'file_name.rar' (length=13)
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号