I'm new to python and need help with a problem. Basically I need to open a file and read it which I can do no problem. The problem arises at line 0, where I need to check the header format.
The header needs to be in the format: p wncf nvar nclauses hard
where 'nvar' 'nclauses' and 'hard' are all positive integers.
For example:
p wncf 1563 817439 186191
would be a valid header line.
Here is coding i have already thanks to a question people开发者_Python百科 answered earlier:
import re
filename = raw_input('Please enter the name of the WNCF file: ')
f = open(filename, 'r')
for line in f:
p = re.compile('p wncf \d+ \d+ \d+$')
if p.match(line[0]) == None:
print "incorrect format"
I still get an incorrect format even when the file is of a correct format. Also, would it be possible to assign the integers to an object?
Thanks in advance.
Alright, a few things.
You only need to compile your regular expression once. In the example you gave above, you're recompiling it for every line in the file.
line[0]
is just the first character in each line. Replaceline[0]
withline
and your code should work.
To assign the integers to an object, you have to surround the groups you want in parentheses. In your case, let
p = re.compile(r"p wncf (\d+) (\d+) (\d+)")
And instead of p.match(line)
, which returns a match object or None
, you could use findall
. Check out the following as a replacement for what you have.
p = re.compile(r"p wncf (\d+) (\d+) (\d+)")
for line in f:
matches = p.findall(line)
if len(matches) != 0:
print matches[0][0], matches[0][1], matches[0][2]
else:
print "No matches."
Edit: If your header values can contain negative numbers as well, you should replace r"p wncf (\d+) (\d+) (\d+)"
with r"p wncf (-?\d+) (-?\d+) (-?\d+)"
.
something like that (lines is a list of all the lines in order):
import re
if re.match(r'p wncf \d+ \d+ \d+', lines[0]) == None:
print "Bad format"
You might want to use p.match(line)
instead. You're passing the first character of the line to the regex, not the whole line.
p, wncf, nvar, nclauses, hard = line.split()
nvar = int(nvar)
nclauses = int(nclauses)
hard = int(hard)
you don't need a regex to do this. here's one way you can check your header.
fh=open("file")
header=fh.readline().rstrip()
if not header.startswith("p wncf") :
print "error"
header=header.split()
if len(header) != 5:
print "error"
if False in map(str.isdigit, header[2:]):
print "Error"
fh.close()
Using regular expressions would be about the easiest way to check this header:-
import re
p = re.compile('p wncf \d+ \d+ \d+$')
if p.match(lineToBeChecked) == None:
print "Header does not have correct format"
Note the use of the trailing $ in the regex to anchor the regex to the end of the line and so protect against additional information being included on the header line (which I've assumed would make it invalid).
If arbitrary numbers of spaces are allowed between parameters the regular expression could be changed to this:-
p = re.compile('p[ ]+wncf[ ]+\d+[ ]+\d+[ ]+\d+$')
精彩评论