Bit of a problem here, basically I have a web address that I use a GET with. for example (Don't worry about the IP, it's just made up for a device) that returns an XML file.
In Data.XML we have some Nodes and sub-no开发者_如何学Cdes. (like a tree I suppose) For example the entire Data is encapsulated like this: (indentions mean it's a subnode of the above...etc)
Basically, I want to use Regular Expressions to check the Subnodes (like I'd want to make sure Serial Number is a 6 digit number, and not something else)
The subnodes will always be called the same thing (Like Serial Number,Device Data,Data.......etc)
What is a good Extension/Language that would be easiest to use to do this? I know basic python and bash, I know C/C++ very well...but this seems like more of a scripting task to me.
Any ideas?
edit: I forgot to add: I may have MORE or less XML tags (like some devices have more settings and such) So i'd be picking out specific ones in the script, not looking at EVERY single tag....since some may have more or less than others.
Please see this response: Regular Expressions to parse template tags in XML
A demonstration for your project...
from xml.etree import ElementTree
import re
def proper_SN(elem):
if'\d{6}', elem.text):
return True
return False
tree = ElementTree.parse('data.xml')
rows = tree.getiterator('SerialNumber')
for row in rows:
print "SerialNumber: %s Passed = %s" % (row.text, proper_SN(row))
Running this...
[mpenning@hotcoffee tmp]$ python
SerialNumber: 154236 Passed = True
[mpenning@hotcoffee tmp]$
I'm not sure how the XML might change... assuming you change the DeviceData
Using a simplified script...
from xml.etree import ElementTree
import re
def proper_SN(elem):
if'\d{6}', elem.text):
return True
return False
tree = ElementTree.parse('data.xml')
serial = tree.find('DeviceData/Info/SerialNumber').text
engine = tree.find('DeviceData/Info/EngineVersion').text
media = tree.find('DeviceData/Info/MediaType').text
if proper_SN:
serstr = "good"
serstr = "bad"
print "Found a %s serial number (%s), with engine %s and media %s" % (serstr, serial, engine, media)
I get
[mpenning@hotcoffee tmp]$ python
Found a good serial number (154236), with engine and media 100BaseT
[mpenning@hotcoffee tmp]$
Use XML parsing modules, like lxml
or ElementTree
(in Python stdlib), instead of regex. Then, you can use a regex to verify the serial number. Here's some code to do this using ElementTree
import re
import xml.etree.ElementTree
tree = xml.etree.ElementTree.XML(r'''
serial = tree.find('DeviceData/Info/SerialNumber')
print serial.text
if re.match('\d{6}', serial.text.strip()):
print 'OK'
print 'ERROR'
You could also do this with XSLT 2.0, if you prefer a more declarative way of writing your rules ( versus the procedural approach with python & lxml ).
Something like:
<xsl:stylesheet xmlns:xsl=""
xmlns:xs="" exclude-result-prefixes="xs" version="2.0">
<xsl:output method="text" />
<xsl:template match="SerialNumber[matches( normalize-space(.), '^\d{6}$')]" >
<xsl:value-of select="." /> Passes.
<xsl:template match="SerialNumber[not( matches( normalize-space(.), '^\d{6}$'))]" >
<xsl:value-of select="." /> Fails.
<xsl:template match="text()">
<!-- override default template, output nothing -->
will output:
154236 Passes.
X154236 Fails.
If you have a lot of rules to check, maybe you should look at XML Schema languages like Relax NG or Schematron. Schema are a way of writing the grammar for a XML document that is more expressive that DTDs. You write the declarative rules and in the schema language, and the processor writes the XSLT code that will validate the XML against the schema.