开发者

How to use nested loops to grab html via proxies using curl?

开发者 https://www.devze.com 2023-02-17 07:08 出处:网络
I am trying to grab source code of pages using proxies. It works up until the point that I am looping through urls and grabbing source code. However once I try to loop through proxies it slows down an

I am trying to grab source code of pages using proxies. It works up until the point that I am looping through urls and grabbing source code. However once I try to loop through proxies it slows down and times out. I do not get an error message, it just keeps working. Is this a problem with the proxies or is it an issue with the code? I am new to PHP so any help is much appreciated.

You can see the problem at pelican-cement.com/bbb.html. This project is an attempt to scrape data from certain pages and we are about halfway though. Here is the code:

  <html>
<body>

<?
$urls=explode("\n", $_POST['url']);
$proxies=explode("\n", $_POST['proxy']);

for ( $counter = 0; $counter <= 6; $counter++) {
for ( $count = 0; $count <= 6; $counter++) {

 $ch = curl_init();
 curl_setopt($ch, CURLOPT_URL,$urls[$counter]);
 curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 0);
 curl_setopt($ch, CURLOPT_PROXY,$proxies[$count]);
 curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 curl_setopt($ch, CURLOPT_CUSTOMREQUEST,'GET');
 curl_setopt ($ch, CURLOPT_HEADER, 1); 
curl_exec ($ch); 
$curl_scraped_page = curl_exec($ch); 

$FileName = time();
$FileHandle = fopen($FileName, 'w') or die("can't open file");
fwrite($FileHandle, $curl_scraped_page);

$hostname="***";
$username="****";
$password="****";
$dbname="****";
$usertable="****";

$con=mysql_connect($hostname,$username, $password) or die ("<html><script language='JavaScript'>alert('Unable to connect to database! Please try again later.'),history.go(-1)</script></html>");
mysql_select_db($dbname ,$con);

$sql="INSERT INTO 开发者_JS百科**** (time, ad1)
VALUES
('$FileName','$domains')";


if (!mysql_query($sql,$con))
  {
  die('Error: ' . mysql_error());
  }
echo "1 record added";

mysql_close($con);

fclose($FileHandle);

curl_close($ch);

echo $FileName; 

echo "<br/>";

sleep(1);

}
}

?>

</body>
</html>


If you're running this through a browser, then you're hitting the timeout : http://www.php.net/manual/en/info.configuration.php#ini.max-execution-time

You could run it through CLI mode to avoid hitting a timeout

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号