I have a website where I have to click a submit button on a form. This gives me a link. I know the link is made up with the paramter that is passed through a hidden value. I was wondering if I could make a python script or something els开发者_JS百科e that would go to the website and click some buttons returning the link that the submit button generates, if so how could I pass the extra parameter that influences the creation of the link?
thanks in advance.
If it is python you're looking for then give the Mechanize library a shot. If you are just extracting small but unique elements of the HTML document then you may as well use regex's with python. To work with the HTML document more pragmatically then BeautifulSoup may be more beneficial, which you can combine with mechanize/python.
It's more straightforward than it may initially seem.
Take a look at this documentation of the urlib2
package. Below is the code you would use, but the documentation explains (very well) what is happening.
Excerpt:
import urllib
import urllib2
url = 'http://www.someserver.com/cgi-bin/register.cgi'
values = {'name' : 'Michael Foord',
'location' : 'Northampton',
'language' : 'Python' }
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
You would need to use an HTML parser like BeautifulSoup to obtain the parameter name and value that is being posted when you click the button.
Edit:
Yes, you could use mechanize for this as well. You would do something like this (untested):
from mechanize import Browser
br = Browser()
br.open("http://www.example.com/") # this would be your website
br.select_form(name="order") # change this to the name of your form
response = br.submit() # submits the form, just like if you clicked the submit button
print response.geturl() # prints the URL you are looking for
You'll need to make this specific to your website/form, but something along these lines should do the trick.
Check out the examples/documentation for the ClientForm
object if you find you need more control.
Once you have downloaded your html data with mechanize as other users said, then you could use Beautifulsoup like this:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html_data)
hidden_tag = soup.find('input',name='hiddenId',type='hidden')
hidden_value = hiddenId['value']
then you could forge a POST with urllib2 like this:
import urllib
import urllib2
url = 'http://yoursite.com'
values = {'yourhiddenname' : hidden_value}
request = urllib2.Request(url, urllib.urlencode(values))
response = urllib2.urlopen(request)
result = response.read()
精彩评论