I'm sorry to have to ask something like this but python's mechanize documentation seems 开发者_开发技巧to really be lacking and I can't figure this out.. they only give one example that I can find for following a link:
response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1)
But I don't want to use a regex, I just want to follow a link based on its url, how would I do this.. also what is "nr" that is used sometimes for following links?
Thanks for any info
br.follow_link
takes either a Link
object or a keyword arg (such as nr=0
).
br.links()
lists all the links.
br.links(url_regex='...')
lists all the links whose urls matches the regex.
br.links(text_regex='...')
lists all the links whose link text matches the regex.
br.follow_link(nr=num)
follows the num
th link on the page, with counting starting at 0. It returns a response object (the same kind what br.open(...) returns)
br.find_link(url='...')
returns the Link
object whose url
exactly equals the given url.
br.find_link
, br.links
, br.follow_link
, br.click_link
all accept the same keywords. Run help(br.find_link)
to see documentation on those keywords.
Edit: If you have a target url that you wish to follow, you could do something like this:
import mechanize
br = mechanize.Browser()
response=br.open("http://www.example.com/")
target_url='http://www.rfc-editor.org/rfc/rfc2606.txt'
for link in br.links():
print(link)
# Link(base_url='http://www.example.com/', url='http://www.rfc-editor.org/rfc/rfc2606.txt', text='RFC 2606', tag='a', attrs=[('href', 'http://www.rfc-editor.org/rfc/rfc2606.txt')])
print(link.url)
# http://www.rfc-editor.org/rfc/rfc2606.txt
if link.url == target_url:
print('match found')
# match found
break
br.follow_link(link) # link still holds the last value it had in the loop
print(br.geturl())
# http://www.rfc-editor.org/rfc/rfc2606.txt
I found this way to do it, for reference for anyone who doesn't want to use regex:
r = br.open("http://www.somewebsite.com")
br.find_link(url='http://www.somewebsite.com/link1.html')
req = br.click_link(url='http://www.somewebsite.com/link1.html')
br.open(req)
print br.response().read()
Or, it will work by the link's text also:
r = br.open("http://www.somewebsite.com")
br.find_link(text='Click this link')
req = br.click_link(text='Click this link')
br.open(req)
print br.response().read()
From looking at the code, I suspect you want
response1 = br.follow_link(link=LinkObjectToFollow)
nr is the same as documented under the find_link call.
EDIT: In my first cursory glance, I didn't realize "link" wasn't a simple link.
nr
is used for where exactly link you follow.
if the text or url you has been regex more than one.
default is 0 so if you use default you will follow link first regex at all .
for example
the source :
<a href="link.html>Click this link</a>
<a href="link2.html>Click this link</a>
in this example we need to follow "Click this link" text but we choose link2.html to follow exactly
br.click_link(text='Click this link', nr=1)
by it you will get link2.html response
精彩评论