I am trying to scrape some data off National Vulnerbability Database (http://web.nvd.nist.gov). What I want to do is enter a search term, which brings me the first 20 results, scrape that data. then I want to click "next 20" until I traversed all results.
I am able to successfully submit search terms, but clicking "next 20" is not working at all.
Tools I am using Python + Mechanize
Here is my code:
# Browser
b = mechanize.Browser()
# The URL to this service
URL = 'http://web.nvd.nist.gov/view/vuln/search'
Search = ['Linux', 'Mac OS X', 'Windows']
def searchDB():
SearchCounter=0
for i in Search:
# Load the page
read = b.open(URL)
# Select the form
开发者_JAVA百科 b.select_form(nr=0)
# Fill out the search form
b['vulnSearchForm:text'] = Search[int(SearchCounter)]
b.submit('vulnSearchForm:j_id120')
result=b.response().read()
file=open(Search[SearchCounter]+".txt","w")
file.write(result)
'''Here is where the problem is. vulnResultsForm:j_id116 is value of the "next 20 button'''
b.select_form(nr = 0)
b.form.click('vulnResultsForm:j_id116')
result=b.response().read()
if __name__ == '__main__':
searchDB()
From the docstring of b.form.click
:
Return request that would result from clicking on a control.
The request object is a urllib2.Request instance, which you can pass to urllib2.urlopen (or ClientCookie.urlopen).
So:
request = b.form.click('vulnResultsForm:j_id116')
b.open(request)
result = b.response().read()
I haven't used Mechanize outside of zope.testbrowser, whcih is based on Mechanize, so there may be differences, but here goes:
You click on the form...Try to get the button and click on the button instead. Something like this, I think:
form.find_control("j_id120").click()
Also:
b['vulnSearchForm:text'] = Search[int(SearchCounter)]
Can be replaced with
b['vulnSearchForm:text'] = i
As i
will contain the value. Python is not javascript, loop variables are not numbers (unless you want them to be).
精彩评论