How is it possibe to generate a list of all the pages o开发者_JS百科f a given website programmatically using PHP?
What I'm basically trying to achieve is to generate something like an sitemap, in nested unordered list with links for all the pages contained in a website.
If all pages are linked to one another, then you can use a crawler or spider to do this.
If there are pages that are not all linked you will need to come up with another method. You can try this:
- Add an "image bug/web beacon/web
bug" to each page you tracked as
follows:
OR
alternatively add a javascript function to each page that makes a call to /scripts/logger.php You can use any of the javascript libraries that make this super simple like Jquery, Mootools, or YUI. - Create the logger.php script, have it save the request's originating URL somewhere like a file or a database.
Pros: - Fairly simple
Cons:
- Requires edits to each page
- Pages that aren't visited don't get logged
Some other techniques that don't really fit your need to do it programatically but may be worth considering include:
- Create a spider or crawler
- Use a ripper such as CURL, or Teleport Plus.
- Using Google Analytics (similar to the image bug technique)
- Use a log analyzer like Webstats or a freeware UNIX webstats analyzer
You can easly list the files with the glob function... But if the pages uses includes/requires and other stuff to mix multiple files into "one page" you'll need to import the Google "site:mysite.com" search results.. Or just create a table with the URL of every page :P
Maybe this can help: http://www.xml-sitemaps.com/ (SiteMap Generator)
精彩评论