Hello I am trying to make a python function to save a list of URLs in .txt file
Example: visit http://forum.domain.com/ and save all viewtopic.php?t=
word URL in .txt file
http://forum.domain.com/viewtopic.php?t=1333
http://forum.domain.com/viewtopic.php?t=2333
I use this function but not save I am very new in python can someone help me to create this
web_obj = opener.open('http://forum.domain.com/')
data = web_obj.read()
fl_url_list = open('urllist.txt', 'r')
url_arr = fl开发者_Python百科_url_list.readlines()
fl_url_list.close()
This is far from trivial and can have quite a few corner cases (I suppose the page you're referring to is a web page)
To give you a few pointers, you need to:
- download the web page : you're already doing it (in
data
) - extract the URLs : this is hard, most probably, you'll want to usae an html parser, extract
<a>
tags, fetch thehref
attribute and put that into a list. then filter that list to have only the url formatted like you like (say with viewtopic). Let's say you got it into urlList - then open a file for Writing Text (thus
wt
, notr
). - write the content
f.write('\n'.join(urlList))
- close the file
I advise to try to follow these steps and ask relevant questions when you're stuck on a particular issue.
精彩评论