开发者

How to download a website in PHP - I have a small problem. (or major?)

开发者 https://www.devze.com 2023-03-31 18:57 出处:网络
I am learning PHP, and I\'m trying to make an application that has a relationship with an external website.

I am learning PHP, and I'm trying to make an application that has a relationship with an external website.

I need to download it.

So I got this code:

$str = file_get_contents($url);

Which should return me the HTML contents of a website.开发者_如何学运维

it works fine for most websites, but for a particular one - http://www.fxp.co.il - it shows crap.

What is the problem ? What can I do to fix it ?

Thank you !

How to download a website in PHP - I have a small problem. (or major?)


Well, you should actually inspect the response headers as they tell you about the encoding of the data returned file_get_contents.

For example, if it's gzip encoded, you need to uncompress it.

Normally you won't notice that because file_get_contents() sends a request in a way that the server knows that it does not support compression.

However some servers just do not care and send you compressed responses anyway:

<?php

$url = 'http://www.fxp.co.il/';

$buffer = file_get_contents($url);

echo $url, '<hr>', '<pre>', implode("\n", $http_response_header), '</pre>';

$bare = gzdecode($buffer);

echo '<hr>', htmlspecialchars(substr($bare, 0, 256));

Output:

http://www.fxp.co.il/
------------------------------------------------------------
HTTP/1.1 200 OK
Server: nginx/0.7.67
Date: Mon, 29 Aug 2011 19:19:55 GMT
Content-Type: text/html; charset=UTF-8
Connection: close
Set-Cookie: bb_lastvisit=1314607056; expires=Tue, 28-Aug-2012 19:12:44 GMT; path=/
Set-Cookie: bb_lastactivity=0; expires=Tue, 28-Aug-2012 19:12:44 GMT; path=/
X-Accel-Expires: 600
Cache-control: must-revalidate, post-check=0, pre-check=0
Pragma: cache
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 14170
Expires: Tue, 24 Jan 1984 08:00:00 GMT
X-Header: Boost Citrus 1.9
Cache-Control: must-revalidate, post-check=0, pre-check=0
------------------------------------------------------------
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" dir="rtl" lang="he"> <head> <meta http-equiv="Content-Type" content="text/html; charset

Take care!

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号