开发者

Is this safe to use this function to read others web site?

开发者 https://www.devze.com 2023-03-28 02:50 出处:网络
I 开发者_StackOverflow社区would like to grab other websites information, so, I have a function like this:

I 开发者_StackOverflow社区would like to grab other websites information, so, I have a function like this:

$doc = new DOMDocument();
@$doc->loadHTMLFile($aUrl);
$xpath = new DOMXPath($doc);  

And it will help me to get the url from the client, but I am worry that some sites may make my program crash, for example, they are timeout, or no response, or keep redirecting or return me a very big web site that may make my program no memory. How can I avoid this?


I would use cURL to fetch the contents of the website, since that allows for far more configuration, and you can set a couple of the options to address your concerns. This should do what you need:

<?php

$aUrl = "http://www.example.com";
$ch = curl_init($aUrl);

curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_MAXREDIRS, 2);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

$html = curl_exec($ch);
curl_close($ch);

$maxlength = 51200; // 50 kilobytes
if ($html && strlen($html) < $maxlength) {
    $doc = new DOMDocument();
    @$doc->loadHTML($html);
    $xpath = new DOMXPath($doc);

    // Do your processing here.
} else {
    die('There was an error fetching the HTML.');
}

?>

That takes care of timeouts and limits redirection to two. It also does not attempt to process pages larger than 50 kilobytes (you may want to adjust that based on how large you expect the page to be).


Simply put, you cannot - easily.

You can use a "co-process", that is, your script runs a separate PHP process using exec() and then check if the process terminated successfully or not. This way your script can react even when the "co-process" terminated due to memory limit or something like that.

And, you should take care that fopen() wrappers are disabled, or at least file:// is not allowed. If not, maybe someone could try to give file:///usr/local/secrets.xml as $aUrl.

The probably best way is to fetch the remote document using cURL and store it locally. cURL supports maximum numbers of redirects, connect timeout and receive timeout. Then, process the file from your temporary storage using the functions above in a "co-process".

0

精彩评论

暂无评论...
验证码 换一张
取 消