web-crawler
Problem with AJAX and PHP!
I have a PHP page that will collect mp3 links from downloads.nl. The result is converted to XML and renders fine.[详细]
2023-03-15 03:23 分类:问答Nutch solrindex command not indexing all URLs in Solr
I have a Nutch index crawled from a specific domain and I am using the solrindex command to push the crawled data to my Solr index. The problem is that it seems that only some of the crawled URLs are[详细]
2023-03-14 18:57 分类:问答Invalid url's throw an exception - python
import httplib import urlparse def getUrl(url): try: parts = urlparse.urlsplit(url) server = parts[1] path = parts[2][详细]
2023-03-14 16:12 分类:问答Crawler data from another website
I using simplecrawler开发者_如何转开发 gem to get data from another website. It\'s easy and simple :). but this website require login to see all. please suggest me a solution to make it. thanksMechani[详细]
2023-03-14 11:03 分类:问答prevent crawler from following POST form action
I have simple form on my site: <form method=\"POST\" action=\"Home/Import\"> ... </form> I get tons of error reports bec开发者_开发技巧ause of crawlers sending HEAD request to Home/Impo[详细]
2023-03-14 07:52 分类:问答How to get all used css attributes of a html node from a given url in Java
Given a url, I need all u开发者_运维问答sed css attributes for a html nodes including those derived from css files depending on the node attributes.[详细]
2023-03-14 06:37 分类:问答Is there any ready to use crawler or tool to extract links from website
There is 开发者_运维问答one blog on website and there are many pdf links on there. But i don\'t want to go through all the pages.[详细]
2023-03-14 05:56 分类:问答What's the average number of bots or spiders to visit a webpage per day?
I\'m trying to get a开发者_C百科 rough estimate of how many of my page views are from bots. What\'s typical for the number of page views that bots and search spiders account for for an average SEO\'d[详细]
2023-03-13 12:03 分类:问答How do search engine bots work? [closed]
Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this po[详细]
2023-03-13 06:26 分类:问答How to get all the URLs of a website using Crawling process with Asp.net?
How to get all the URLs of a website Suppose I want to crawl some part of data in a website which in different web pages how to get all the url\'s list to get into all those similar pages.[详细]
2023-03-12 19:10 分类:问答