开发者

cUrl - store everything from a webpage

开发者 https://www.devze.com 2023-02-09 11:29 出处:网络
i\'m saving cookies in a text file by using this function: $cookie_file_path = \"\".dirname(__FILE__).\"/cookie.txt\"; // Please set yo开发者_Python百科ur Cookie File path

i'm saving cookies in a text file by using this function:

$cookie_file_path = "".dirname(__FILE__)."/cookie.txt"; // Please set yo开发者_Python百科ur Cookie File path
$fp = fopen($cookie_file_path,'wb'); 
fclose($fp);
$ch = curl_init();
// other curl functions here // 
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
$loginpage_html = curl_exec ($ch);
curl_close ($ch);

it saves cookies to the same folder as cookie.txt, and it uses same cookies while connecting.

i'd like to save images (css,scripts+everythings) to the same folder. any advice?


I suggest using php DOM extension http://php.net/manual/en/book.dom.php It's quit similiar to javascript. You just loop thru typical tags like <img>, <script> <style>, search for attributes src and get links to referenceing resources and retrieve those contents using the same cURL or file_get_contents.

Check out the DOM manual, it has a lot of useful comments.


try wget with the recursive switch


First I see you create the file using fopen and fclose, you can just use the function touch for that.

cURL is only used to get the contents of requested page. What you can do is then parse the HTML for links and use cURL in a loop to get those.

There is an set_opt CURLOPT_FILE which is where the output will go. For example:

<?php

foreach($links as $link){
    $file = dirname(__FILE__)."/".basename($link);
    touch($file);
    // get page
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FILE, $file);
    $output = curl_exec($ch);
    curl_close ($ch);
}

?>

I didn't check that code, but thats a base for what you want. Just use regex or some functions to get the links.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号