开发者

Parsing a datafile in python (2.5.2)

开发者 https://www.devze.com 2022-12-14 11:53 出处:网络
I have a message definition file that looks like this struct1 { field=\"name\" type=\"string\" ignore=\"false\";

I have a message definition file that looks like this

struct1 
{
  field="name" type="string" ignore="false"; 
  field="id" type="int" enums=" 0="val1" 1="val2" ";
}

struct2
{
  field = "object" type="struct1";
  ...
}

How can I parse this into a dictionary with keys 'struct1, struct2' and values should be a list of dictionaries, each corresponding to the respective line number so that i can do the following

dict['struct1'][0]开发者_Python百科['type'] // Would return 'string'
dict['struct1'][1]['type'] // Would return 'int'
dict['struct1'][1]['enums']['0'] // Would return 'val1'
dict['struct2'][0]['type'] // Would return 'struct1'

and so on..

Also, I can change the format of the definition file and if any of you have suggestions on modifying the definition file format to make it easier to parse, please let me know.

Thanks


Might I recommend YAML? IMHO the syntax is more readable for data entry, and then you don't have to write and maintain a parser. Eschew XML -- it is good for marking up text, but not good for data entry since the text isn't human readable with all the duplicate tags everywhere.


Use YAML instead. There is PyYAML library for python. It is heavily used by Google AppEngine.

This is just a friendly suggestion :-)

Example ( Mapping Scalars to Sequences ):

american:
  - Boston Red Sox
  - Detroit Tigers
  - New York Yankees
national:
  - New York Mets
  - Chicago Cubs
  - Atlanta Braves

There is also JSON of course which has ample support on Python (but tends to hurt my fingers a bit more ;-)


Use can use json as file format, it supports (in python lingo) dictionaries and lists. Since json support is native only for python 2.6 and higher, you'll need this library: http://pypi.python.org/pypi/simplejson/2.0.9

{ "struct1" 
    [
        {"field" : "name", "type" : "string", "ignore" : false },
        {"field" : "id", "type" : "int", "0" : "val1", "1" : "val2" }
        {"field" : "id", "type" : "int", "enums" : { "0": "val1", "1": "val2"}}
    ]
  "struct2"
    [ ... ]
}

python part (sketched, not tested):

>>> import simplejson as json
>>> d = json.loads(yourjsonstring)
>>> d['struct1'][0]['field']
name
>>> d['struct1'][2]['enums']['0']
val1
...


I would simply use Python for the message definition file format.

Let your message definition file be a plain Python file:

# file messages.py
messages = dict(
    struct1=[
        dict(field="name", type="string", ignore=False),
        dict(field="id", type="int", enums={0: "val1", 1: "val2"}),
        ],
    struct2=[
        dict(field="object", type="struct1"),
        ]
    )

Your program can then import and use that data structure directly:

# in your program
from messages import messages
print messages['struct1'][0]["type"]
print messages['struct1'][1]['type']
print messages['struct1'][1]['enums'][0]
print messages['struct2'][0]['type']

Using this approach, you let Python do the parsing for you.

And you also gain a lot of possibilities. For instance, imagine you (for some strange reason) have a message structure with 1000 fields named "field_N". Using a conventional file format you would have to add 1000 lines of field definitions (unless you build some looping into your config file parser - you are then on your way to creating a programming language anyway). Using Python for this purpose, you could do something like:

messages = dict(
    ...
    strange_msg=[dict(field="field_%d" % i) for i in range(1000)],
    ...
    )

BTW, on Python 2.6, using named tuples instead of dict is an option. Or use on of the numerous "Bunch" classes available (see the Python cookbook for a namedtuple for 2.5).

EDIT:

Below is code that reads message definition files as specified on the command line. It uses execfile instead of import.

# file mainprogram.py

def read_messages_from_file(filename):
    module_dict = {}
    execfile(filename, module_dict)
    return module_dict['messages']

if __name__ == "__main__":
    from pprint import pprint
    import sys

    for arg in sys.argv[1:]:
        messages = read_messages_from_file(arg)
        pprint(messages)

Executing:

$ python mainprogram.py messages1 messages2 messages3

will read and print the messages defined in each file.


Pyparsing is a nice easy to use library. That what I would use.

http://pyparsing.wikispaces.com/


Since you are at liberty to change the file format, you could change it to any of several formats that have Python libraries to read and write. For example, JSON, YAML, XML, or even the built-in ConfigParser.

[struct1]
field: name
type: string
ignore: false
# etc.
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号