How to regex in python?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-06 18:47 出处：网络

I am trying to parse the keywords from google suggest, this is the url: http://google.com/complete/search开发者_Go百科?output=toolbar&q=test

I am trying to parse the keywords from google suggest, this is the url:

http://google.com/complete/search开发者_Go百科?output=toolbar&q=test

I've done it with php using:

'|<CompleteSuggestion><suggestion data="(.*?)"/><num_queries int="(.*?)"/></CompleteSuggestion>|is'

But that wont work with python re.match(pattern, string), I tried a few but some show error and some return None.

How can I parse that info? I dont want to use minidom because I think regex will be less code.

You could use etree:

>>> from xml.etree.ElementTree import XMLParser
>>> x = XMLParser()
>>> x.feed('<toplevel><CompleteSuggestion><suggestion data=...')
>>> tree = x.close()
>>> [(e.find('suggestion').get('data'), int(e.find('num_queries').get('int')))
     for e in tree.findall('CompleteSuggestion')]
[('test internet speed', 31800000), ('test', 686000000), ...]

It is more code than a regex, but it also does more. Specifically, it will fetch the entire list of matches in one go, and unescape any weird stuff like double-quotes in the data attribute. It also won't get confused if additional elements start appearing in the XML.

RegEx match open tags except XHTML self-contained tags

This is an XML document. Please, reconsider an XML parser. It will be more robust and probably take you less time in the end, even if it is more code.