I am trying to grab source code of pages using proxies. It works up until the point that I am looping through urls and grabbing source code. However once I try to loop through proxies it slows down and times out. I do not get an error message, it just keeps working. Is this a problem with the proxies or is it an issue with the code? I am new to PHP so any help is much appreciated.
You can see the problem at pelican-cement.com/bbb.html. This project is an attempt to scrape data from certain pages and we are about halfway though. Here is the code:
<html>
<body>
<?
$urls=explode("\n", $_POST['url']);
$proxies=explode("\n", $_POST['proxy']);
for ( $counter = 0; $counter <= 6; $counter++) {
for ( $count = 0; $count <= 6; $counter++) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$urls[$counter]);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 0);
curl_setopt($ch, CURLOPT_PROXY,$proxies[$count]);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST,'GET');
curl_setopt ($ch, CURLOPT_HEADER, 1);
curl_exec ($ch);
$curl_scraped_page = curl_exec($ch);
$FileName = time();
$FileHandle = fopen($FileName, 'w') or die("can't open file");
fwrite($FileHandle, $curl_scraped_page);
$hostname="***";
$username="****";
$password="****";
$dbname="****";
$usertable="****";
$con=mysql_connect($hostname,$username, $password) or die ("<html><script language='JavaScript'>alert('Unable to connect to database! Please try again later.'),history.go(-1)</script></html>");
mysql_select_db($dbname ,$con);
$sql="INSERT INTO 开发者_JS百科**** (time, ad1)
VALUES
('$FileName','$domains')";
if (!mysql_query($sql,$con))
{
die('Error: ' . mysql_error());
}
echo "1 record added";
mysql_close($con);
fclose($FileHandle);
curl_close($ch);
echo $FileName;
echo "<br/>";
sleep(1);
}
}
?>
</body>
</html>
If you're running this through a browser, then you're hitting the timeout : http://www.php.net/manual/en/info.configuration.php#ini.max-execution-time
You could run it through CLI mode to avoid hitting a timeout
精彩评论