开发者

How to know if two URLS lead to the same page

开发者 https://www.devze.com 2022-12-19 06:06 出处:网络
I want to write a small script to know if two URLS lead to the same page. For example: http://google.com and http://google.com/# will lead to the same URL. Also sometimes http://URL1.com and http://UR

I want to write a small script to know if two URLS lead to the same page. For example: http://google.com and http://google.com/# will lead to the same URL. Also sometimes http://URL1.com and http://URL2.com also lead to the same page although they are not the same URL.

Is there an easy way to do that?

If you need more information please tell me and i will edit the post

NOTE: this is NOT a homework que开发者_StackOverflow社区stion so please be as helpful as you can.

Thank you all!


This is a really dirty way, but I suppose that's what you want:

if(file_get_contents('http://URL1.com') === file_get_contents('http://URL2.com')) {
    // Leading to the same page!
}

Note that it will NOT work if the page is having minor changes such as time (i.e, request is made exactly between 13:45:59 and 13:46:00), cookie, or anything dynamic.


So...

This can be very tricky, as there is no 'real' way to detect it. You could detect a Location header to see if there is a redirect, but that is not foolproof, as some people do an internal redirect. (Meaning, stackoverflow.com looks the same as stackoverflow2.com.)

The only real way I can think of is to check the contents of the page:

AKA

$c = curl_init();
curl_setopt( $c, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $c, CURLOPT_URL, 'http://localhost/admin/' );
$content1 = curl_exec( $c );
curl_close($c);

$c = curl_init();
curl_setopt( $c, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $c, CURLOPT_URL, 'http://localhost/admin/' );
$content2 = curl_exec( $c );
curl_close($c);

if($content1 == $content2)
{
    // same content
}

If you wanted to, you could shorten that to only check Content-Length or something else, but you wouldn't be able to tell simply from headers.


You can use parse_url() to handle the trivial cases. For detecting redirects you'll have to use one of the parts of the HTTP facilities to get the headers and detect the Location header.

0

精彩评论

暂无评论...
验证码 换一张
取 消