from HTMLParser import HTMLParser
from urllib import urlopen
class Spider(HTMLParser):
def __init__(self, url):
HTMLParser.__init__(self)
req = urlopen(url)
self.feed(req.read())
def handle_starttag(self, tag, attrs):
if tag == 'a' and att开发者_如何学编程rs:
print "Found link => %s" % attrs[0][1]
Spider('http://stackoverflow.com/questions/tagged/python')
python spider.py > output.html
Put this at the top of your script:
import sys
sys.stdout = file('output.html', 'w')
This will redirect everything your script writes to the standard output (which includes print
statements) to the file 'output.html'.
I haven't messed with Spider at all, but is it printing html, or are you just printing the "Found link..." lines? If you are just printing those, you can do something like outfl = open('output.txt')
And then, instead of print
, call outfl.write("Found link => %s" % attrs[0][1])
.
You can always write out <html><head></head><body>
before, and </body></html>
after it if you're needing it in HTML format. Also, use outfl = open('output.html')
instead of .txt for the filename.
Did I totally miss the question here? If you want better answers, you ought to describe the question a little better.
精彩评论