I am trying to write a script that searches for specific MIME types on a website containing download links (.mp开发者_运维技巧3, .avi, .mpeg) and want to be able to crawl that site and then download any files stored there. The pseudocode for this script:
Input URL
function search ()
{
Search URL for matching MIME types and download to dir on my pc
}
Okay, that was really bad, but I am still learning. Would a Perl script be best for this?
Take a look at the wget
command. Here is an example command which will search a site recursively for all mp3, avi and mpeg files and save them into the current directory:
wget -r -H -nd -N -np -A.mp3,.avi,.mpeg http://www.someurl.com
This is what the options mean:
-r turns on recursive retrieving
-H Enable spanning across hosts when doing recursive retrieving.
-nd to save all files in a single directory
-N timestamping
-np Do not ever ascend to the parent directory when retrieving recursively.
-A file name suffixes to accept
You can also add other options for recursion depth, timeouts etc. See man wget
for more information.
Yes, it absolutely would. Have a look at the Module WWW::Mechanize.
精彩评论