i have a website that i now support and need to list开发者_如何转开发 all live pages/ url's. is there a crawler i can use to point to my homepage and have it list all the pages/url's that it finds.
then i can delete any that dont make their way into this listing as they will be orphan pages/url's that have never been cleaned up?
I am using DNN and want to kill un-needed pages.
Since you're using a database-driven CMS, you should be able to do this either via the DNN admin interface or by looking directly in the database. Far more reliable than a crawler.
Back in the old days I used wget for this exact purpose, using its recursive retrieval functionality. It might not be the most efficient way, but it was definitely effective. YMMV, of course, since some sites will return a lot more content than others.
精彩评论