Python: Checking Header Format_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-14 04:22 出处：网络

I\'m new to python and need help with a problem. Basically I need to open a file and read it which I can do no problem. The problem arises at line 0, where I need to check the header format.

I'm new to python and need help with a problem. Basically I need to open a file and read it which I can do no problem. The problem arises at line 0, where I need to check the header format.

The header needs to be in the format: p wncf nvar nclauses hard where 'nvar' 'nclauses' and 'hard' are all positive integers.

For example:

p wncf 1563 817439 186191

would be a valid header line.

Here is coding i have already thanks to a question people开发者_Python百科 answered earlier:

import re 
filename = raw_input('Please enter the name of the WNCF file: ') 
f = open(filename, 'r') 

for line in f: 
    p = re.compile('p wncf \d+ \d+ \d+$') 
    if p.match(line[0]) == None: 
        print "incorrect format"

I still get an incorrect format even when the file is of a correct format. Also, would it be possible to assign the integers to an object?

Thanks in advance.

Alright, a few things.

You only need to compile your regular expression once. In the example you gave above, you're recompiling it for every line in the file.
line[0] is just the first character in each line. Replace line[0] with line and your code should work.

To assign the integers to an object, you have to surround the groups you want in parentheses. In your case, let

p = re.compile(r"p wncf (\d+) (\d+) (\d+)")

And instead of p.match(line), which returns a match object or None, you could use findall. Check out the following as a replacement for what you have.

p = re.compile(r"p wncf (\d+) (\d+) (\d+)") 
for line in f: 
    matches = p.findall(line)
    if len(matches) != 0:
        print matches[0][0], matches[0][1], matches[0][2]
    else:
        print "No matches."

Edit: If your header values can contain negative numbers as well, you should replace r"p wncf (\d+) (\d+) (\d+)" with r"p wncf (-?\d+) (-?\d+) (-?\d+)".

something like that (lines is a list of all the lines in order):

import re
if re.match(r'p wncf \d+ \d+ \d+', lines[0]) == None:
    print "Bad format"

You might want to use p.match(line) instead. You're passing the first character of the line to the regex, not the whole line.

p, wncf, nvar, nclauses, hard = line.split()
nvar = int(nvar)
nclauses = int(nclauses)
hard = int(hard)

you don't need a regex to do this. here's one way you can check your header.

fh=open("file")
header=fh.readline().rstrip()
if not header.startswith("p wncf") :
    print "error"
header=header.split()
if len(header) != 5:
    print "error"
if False in map(str.isdigit, header[2:]):
    print "Error"
fh.close()

Using regular expressions would be about the easiest way to check this header:-

import re
p = re.compile('p wncf \d+ \d+ \d+$')
if p.match(lineToBeChecked) == None:
  print "Header does not have correct format"

Note the use of the trailing $ in the regex to anchor the regex to the end of the line and so protect against additional information being included on the header line (which I've assumed would make it invalid).

If arbitrary numbers of spaces are allowed between parameters the regular expression could be changed to this:-

p = re.compile('p[ ]+wncf[ ]+\d+[ ]+\d+[ ]+\d+$')

Python: Checking Header Format

精彩评论

关注公众号

热门标签

图文推荐

Python: Checking Header Format

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：