开发者

Can I parse the directory listing of an external webpage?

开发者 https://www.devze.com 2023-03-21 18:09 出处:网络
Is it possible to parse the directory listing of a webpage which is external given the webpage is accessibl开发者_如何学Goe and it shows a list of the files when I access it. I only want to know is it

Is it possible to parse the directory listing of a webpage which is external given the webpage is accessibl开发者_如何学Goe and it shows a list of the files when I access it. I only want to know is it possible to parse the files dynamically in PHP and how? -thank you

Sorry for being not clear. I mean a directory listing such as: http://www.ibiblio.org/pub/ (Index of /..) and ability to read the content as array or something easy to manipulate in my script


You can use preg_match or DomDocument

For your case:

$contents = file_get_contents("http://www.ibiblio.org/pub/");
preg_match_All("|href=[\"'](.*?)[\"']|", $contents, $hrefs);
var_dump($hrefs);

If you want to take a look at a working demo.


If you're getting a directory listing back that is full of links in a proper XHTML document you can use DOMDocument, and code such as the following to get back a list of files:

$doc = new DOMDocument();
$doc->preserveWhitespace = false;
$doc->load('directorylisting.html');

$files = $doc->getElementsByTagName('a');

$files is now a list of DOMElements that you can iterate through and get the href attribute to get a full path to the files in the listing.

Note that this approach requires a properly formed directory listing returned from the server. You cannot, for example, do a request on stackoverflow.com and get a directory listing of the files.

If this doesn't work (perhaps malformed HTML) you could use Regular Expressions (eg. preg_match_all) to find <a tags, like such:

preg_match_all('@<a href\="([a-zA-Z\.\-\_\/ ]*)">(.*)</a>@', file_get_contents('http://www.ibiblio.org/pub/'), $files);
var_dump($files);

$files would still be matched elements, just a set of arrays.


UPDATE, I tested with your URL (http://www.ibiblio.org/pub/) and it works fine (the preg_match_all method).


Yes it is very possible. I'm not quite clear what you mean by directory listing but you should research website crawlers. This is essentially what you're asking about but written in PHP.


PHP file_get_content will do the trick for you.

(Assuming your http request for this page returns the listing of files, as you have mentioned)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号