Is there any way I could select all the <option>s
in the following HTML form <select>
into a python list, like so, ['a',开发者_JAVA技巧'b','c','d']?
<select name="sel">
<option value="a">a</option>
<option value="b">b</option>
<option value="c">c</option>
<option value="d">d</option>
</select>
Many thanks in advance.
import re
text = '''<select name="sel">
<option value="a">a</option>
<option value="b">b</option>
<option value="c">c</option>
<option value="d">d</option>
</select>'''
pattern = re.compile(r'<option value="(?P<val>.*?)">(?P=val)</option>')
handy_list = pattern.findall(text)
print handy_list
will output
['a', 'b', 'c', 'd']
Disclaimer: Parsing HTML with regular expressions does not work in the general case.
You might want to look at BeautifulSoup if you want to parse other HTML data also
from BeautifulSoup import BeautifulSoup
text = '''<select name="sel">
<option value="a">a</option>
<option value="b">b</option>
<option value="c">c</option>
<option value="d">d</option>
</select>'''
soup = BeautifulSoup(text)
print [i.string for i in soup.findAll('option')]
精彩评论