开发者

Python metaprogramming for XML parsing

开发者 https://www.devze.com 2023-01-14 15:57 出处:网络
I\'m trying to create a simple XML parser where each different XML schema has it\'s own parser class but I can\'t figure out what the best way is. What I in effect would like to do is something like t

I'm trying to create a simple XML parser where each different XML schema has it's own parser class but I can't figure out what the best way is. What I in effect would like to do is something like this:

in = sys.stdin
xmldoc = minidom.parse(in).documentElement

xmlParser = xmldoc.nodeName
parser = xmlParser()
out = parser.parse(xmldoc)

I'm not also quite sure if I get the document root name correctly, but that's the idea: create an object of a class with similar name to the document root and use the parse() function in that class to parse and handle th开发者_如何学JAVAe input.

What would be the simplest way to achieve this? I've been reading about introspection and templates but haven't been able to figure this out yet. I've done a similar thing with Java in the past and AFAIK, Ruby also makes this simple. What's the pythonian way?


As pointed out by Mark in his comment, to get a reference to a class that you know the name of at runtime, you use getattr.

doc = minidom.parse(sys.stdin)
# is equivalent to
doc = getattr(minidom, "parse")(sys.stdin)

Below is a corrected version of your pseudo-code.

from xml.dom import minidom
import sys
import myParsers # a module containing your parsers

xmldoc = minidom.parse(sys.stdin).documentElement

myParserName = xmldoc.nodeName
myParserClass = getattr(myParsers, myParserName)
# create an instance of myParserClass by calling it with the documentElement
parser = myParserClass(xmldoc)
# do whatever you want with the instance of your parser class
output = parser.generateOutput()

getattr will return an AttributeError if the attribute doesn't exist, so you can wrap the call in a try...except or pass a third argument to getattr, wich will be returned if the attribute isn't found.


I think most python programmers would just use lxml to parse their xml. If you still want to wrap that in classes you could, but as delnan said in his comment, it's a bit unclear what you really mean.

from lxml import etree

tree = etree.parse('my_doc.xml')
for element in tree.getroot():
    ...

A couple of side notes, if other programmers are going to be reading your code, you should try to at least roughly follow PEP 8. More importantly though, you really shouldn't assign to builtins like "in."

0

精彩评论

暂无评论...
验证码 换一张
取 消