I'm trying to parse a page using my python script. But 开发者_Python百科<nobr>
tag along with '&' is giving me trouble. Here the actual html.
<A HREF="http://enpass.in/algo/c12.html" CLASS="style"> <NOBR>Simulation for 1st & 2nd path</NOBR></A>
Now my handle_data
function of my parser(using sgmllib) is not able to handle the data properly. Here is the handle_data code.
def handle_data(self, data):
self.datainfo.append(data)
I expect datainfo array to be have only one element namely "Simulation for 1st & 2nd path"
However, when I print the datainfo array, the actual contents of datainfo array are 7 in number.
datainfo -> ['', '', 'Simulation for 1st', '&', '2nd path', '', '']
Whats happening?
You need to encode the ampersand, like &
to become valid HTML.
精彩评论