开发者

How to match a text format to a string without regex in python?

开发者 https://www.devze.com 2023-02-25 02:56 出处:网络
I am reading a file with lines of the form exemplified by [ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34

I am reading a file with lines of the form exemplified by

[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34

I saw Matlab code to read this file given by

[I,L,Ls,R,Rs,p,e,n] = textread(f1,'[ %u ] L= %u%s R= %u%s p= %n e=%u n=%u')

I want to read this file in Python. The only thing I know of is regex, and reading even a part of this line leads to something like

re.compile('\s*\[\s*(?P<id>\d+)\s*\]\s*L\s*=\s*(?P<Lint>\d+)\s*\((?P<Ltype>[DG])\)\s*R\s*=\s*(?P<Rint>\d+)\s*')

w开发者_C百科hich is ugly! Is there an easier way to do this in Python?


You can make the regexp more readable by building it with escape/replace...

number = "([-+0-9.DdEe ]+)"
unit = r"\(([^)]+)\)"
t = "[X] L=XU R=XU p=X e=X n=X"
m = re.compile(re.escape(t).replace("X", number).replace("U", unit))


This looks more or less pythonic to me:

line = "[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34"

parts = (None, int, None,
         None, int, str,
         None, int, str,
         None, float,
         None, int,
         None, int)

[I,L,Ls,R,Rs,p,e,n] = [f(x) for f, x in zip(parts, line.split()) if f is not None]

print [I,L,Ls,R,Rs,p,e,n]


Pyparsing is a fallback from unreadable and fragile regex processors. The parser example below handles your stated format, plus any variety of extra whitespace, and arbitrary order of the assignment expressions. Just as you have used named groups in your regex, pyparsing supports results names, so that you can access the parsed data using dict or attribute syntax (data['Lint'] or data.Lint).

from pyparsing import Suppress, Word, nums, oneOf, Regex, ZeroOrMore, Optional

# define basic punctuation
EQ,LPAR,RPAR,LBRACK,RBRACK = map(Suppress,"=()[]")

# numeric values
integer = Word(nums).setParseAction(lambda t : int(t[0]))
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda t : float(t[0]))

# id and assignment fields
idRef = LBRACK + integer("id") + RBRACK
typesep = LPAR + oneOf("D G") + RPAR
lExpr = 'L' + EQ + integer("Lint")
rExpr = 'R' + EQ + integer("Rint")
pExpr = 'p' + EQ + real("pFloat")
eExpr = 'e' + EQ + integer("Eint")
nExpr = 'n' + EQ + integer("Nint")

# accept assignments in any order, with or without leading (D) or (G)
assignment = lExpr | rExpr | pExpr | eExpr | nExpr
line = idRef + lExpr + ZeroOrMore(Optional(typesep) + assignment)


# test the parser
text = "[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34"
data = line.parseString(text)
print data.dump()


# prints
# [0, 'L', 9, 'D', 'R', 14, 'D', 'p', 0.034722200000000002, 'e', 10, 'n', 34]
# - Eint: 10
# - Lint: 9
# - Nint: 34
# - Rint: 14
# - id: 0
# - pFloat: 0.0347222

Also, the parse actions do the string->int or string->float conversion at parse time, so that afterward the values are already in a usable form. (The thinking in pyparsing is that, while parsing these expressions, you know that a word composed of numeric digits - or Word(nums) - will safely convert to an int, so why not do the conversion right then, instead of just getting back matching strings and having to re-process the sequence of strings, trying to detect which ones are integers, floats, etc.?)


Python does not have a scanf equivalent as stated on the re page for Python.

Python does not currently have an equivalent to scanf(). Regular expressions are generally more powerful, though also more verbose, than scanf() format strings. The table below offers some more-or-less equivalent mappings between scanf() format tokens and regular expressions.

However, you could probably build your own scanf like module using the mappings on that page.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号