I have written a Python program to find the carrier of a cell phone given the number. It downloads the source of http://www.whitepages.com/carrier_lookup?carrier=other&number_0=1112223333&response=1 (where 1112223333 is the phone number to lookup) and saves this as carrier.html. In the source, the carrier is in the line after the [div class="carrier_result"] tag. (switch in < and > for [ and ], as stackoverflow thought I was trying to format using the html and would not display it.)
My program currently searches the file and finds the line containing the div tag, but now I need a way to store th开发者_JS百科e next line after that as a string. My current code is: http://pastebin.com/MSDN0vbC
What you really want to be doing is parsing the HTML properly. Use the BeautifulSoup library - it's wonderful at doing so.
Sample code:
import urllib2, BeautifulSoup
opener = urllib2.build_opener()
opener.addheaders[0] = ('User-agent', 'Mozilla/5.1')
response = opener.open('http://www.whitepages.com/carrier_lookup?carrier=other&number_0=1112223333&response=1').read()
bs = BeautifulSoup.BeautifulSoup(response)
print bs.findAll('div', attrs={'class': 'carrier_result'})[0].next.strip()
You should be using a HTML parser such as BeautifulSoup
or lxml
instead.
to get the next line, you can use
htmlsource = open('carrier.html', 'r')
for line in htmlsource:
if '<div class="carrier_result">' in line:
nextline = htmlsource.next()
print nextline
A "better" way is to split on </div>
, then get the things you want, as sometimes the stuff you want can occur all in one line. So using next() if give wrong result.eg
data=open("carrier.html").read().split("</div>")
for item in data:
if '<div class="carrier_result">' in item:
print item.split('<div class="carrier_result">')[-1].strip()
by the way, if its possible, try to use Python's own web module, like urllib
, urllib2
instead of calling external wget
.
精彩评论