I'm trying out regex (import re
) to extract the info I want from a log file.
UPDATE: Added the C:\WINDOWS\security
folder permissions which broke all of the sample codes.
Say the format of the log is:
C:\:
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
\Everyone Allowed: Read & Execute
(No auditing)
C:\WINDOWS\system32:
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Power Users Allowed: Modify
BUILTIN\Power Users Allowed: Special Permissions:
Delete
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
(No auditing)
C:\WINDOWS\system32\config:
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Power Users Allowed: Read & Execute
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
(No auditing)
C:\WINDOWS\security:
BUILTIN\Users Allowed: Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
BUILTIN\Power Users Allowed: Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
(No auditing)
And it repeats for a few other directories. How can I split them into paragraphs
and then check for lines containing Special Permissions:
?
Like this:
- Separate the whole string1 into few parts,
C:\
andC:\WINDOWS\system32
. - Look in each line that contains 'Special Permissions:'
- Display the whole line, e.g.:
C:\:
BUILTIN\Users Allowed: Special Permissions: \n\
Create Folders\n\
BUILTIN\Users Allowed: Special Permissions: \n\
Create Files\n\
- Repeat for next 'paragraph'
I was thinking of:
1. Search the whole text file for r"(\w+:\\)(\w+\\?)*:"
- return me the path
2. String function or regex to get the rest of the output
3. Remove all the other lines besides the ones with Special Permissions
4. Display, and repeat step 1
But I think it is not efficient.
Can anyone guide me on this? Thanks.
Example output:
C:\:
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
C:\WINDOWS\system32:
BUILTIN\Power Users Allowed: Special Permissions:
Delete
C:\WINDOWS\security:
BUILTIN\Users Allowed: Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
BUILTIN\Power Users Allowed: Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
C:\WINDOWS\system32\config
doesn't show up as there's no Special Permission in the lines.
The template I am using:
import re
text = ""
def main():
f = open('DirectoryPermissions.xls', 'r')
global text
for line in f:
text = text + line
f.close
print text
def regex():
global text
&l开发者_JAVA技巧t;insert code here>
if __name__ == '__main__':
main()
regex()
# I would replace this with reading lines from a file,
# rather than splitting a big string containing the file.
section = None
inspecialperm = False
with open("testdata.txt") as w:
for line in w:
if not line.startswith(" "):
inspecialperm = False
if section is None:
section = line
elif len(line) == 0:
section = None
elif 'Special Permissions' in line:
if section:
print section
section = ""
inspecialperm = True
print line,
elif inspecialperm:
print line,
You don't need the re
module at all if you parse strings by "split & strip", which is more efficient:
for paragraph in string1.split('\n\n'):
path = paragraph.split('\n', 1)[0].strip().rstrip(':')
paragraph = paragraph.replace(': \n', ': ') # hack to have permissions in same line
for line in paragraph.split('\n'):
if 'Special Permissions: ' in line:
permission = line.rsplit(':', 1)[-1].strip()
print 'Path "%s" has special permission "%s"' % (path, permission)
Replace the print
statement with whatever fits your needs.
EDIT: As pointed out in the comment, the previous solution doesn't work with the new input lines in the edited question, but here's how to fix it (still more efficiently than using regular expressions):
for paragraph in string1.split('\n\n'):
path = paragraph.split('\n', 1)[0].strip().rstrip(':')
owner = None
for line in paragraph.split('\n'):
if owner is not None and ':' not in line:
permission = line.rsplit(':', 1)[-1].strip()
print 'Owner "%s" has special permission "%s" on path "%s"' % (owner, permission, path)
else:
owner = line.split(' Allowed:', 1)[0].strip() if line.endswith('Special Permissions: ') else None
Similar to milkypostman's solution, but in the format you are trying to have that output in:
lines=string1.splitlines()
seperator = None
for index, line in enumerate(lines):
if line == "":
seperator = line
elif "Special Permissions" in line:
if seperator != None:
print seperator
print line.lstrip()
offset=0
while True:
#if the line's last 2 characters are ": "
if lines[index+offset][-2:]==": ":
print lines[index+offset+1].lstrip()
offset+=1
else:
break
Here is a solution using the re
module and thefindall
method.
data = '''\
C:\:
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
\Everyone Allowed: Read & Execute
(No auditing)
C:\WINDOWS\system32:
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Power Users Allowed: Modify
BUILTIN\Power Users Allowed: Special Permissions:
Delete
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
(No auditing)
C:\WINDOWS\system32\config:
BUILTIN\Users Allowed: Read & Execute
BUILTIN\Power Users Allowed: Read & Execute
BUILTIN\Administrators Allowed: Full Control
NT AUTHORITY\SYSTEM Allowed: Full Control
(No auditing)
'''
if __name__ == '__main__':
import re
# A regular expression to match a section "C:...."
cre_par = re.compile(r'''
^C:.*?
^\s*$''', re.DOTALL | re.MULTILINE | re.VERBOSE)
# A regular expression to match a "Special Permissions" line, and the
# following line.
cre_permissions = re.compile(r'''(^.*Special\ Permissions:\s*\n.*)\n''',
re.MULTILINE | re.VERBOSE)
# Create list of strings to output.
out = []
for t in cre_par.findall(data):
out += [t[:t.find('\n')]] + cre_permissions.findall(data) + ['']
# Join output list of strings together using end-of-line character
print '\n'.join(out)
Here is the generated output:
C:\:
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
BUILTIN\Power Users Allowed: Special Permissions:
Delete
C:\WINDOWS\system32:
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
BUILTIN\Power Users Allowed: Special Permissions:
Delete
C:\WINDOWS\system32\config:
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
BUILTIN\Power Users Allowed: Special Permissions:
Delete
Thanks to milkypostman
, scoffey
, and the rest I came up with the solution:
def regex():
global text
for paragraph in text.split('\n\n'):
lines = paragraph.split('\n', 1)
#personal modifier to choose certain output only
if lines[0].startswith('C:\\:') or lines[0].startswith('C:\\WINDOWS\system32:') or lines[0].startswith('C:\\WINDOWS\\security:'):
print lines[0]
iterables = re.finditer(r".*Special Permissions: \n(\s+[a-zA-Z ]+\n)*", lines[1])
for items in iterables:
#cosmetic fix
parsedText = re.sub(r"\n$", "", items.group(0))
parsedText = re.sub(r"^\s+", "", parsedText)
parsedText = re.sub(r"\n\s+", "\n", parsedText)
print parsedText
print
I will still go through all of the posted codes (esp. scoffey's as I never knew pure string manipulation is that powerful). Thanks for the insight!
Of course, this will not be the most optimal, but it works for my case. If you have any suggestions, do feel free to post.
Output:
C:\Python27>openfile.py
C:\:
BUILTIN\Users Allowed: Special Permissions:
Create Folders
BUILTIN\Users Allowed: Special Permissions:
Create Files
C:\WINDOWS\security:
BUILTIN\Users Allowed: Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
BUILTIN\Power Users Allowed: Special Permissions:
Traverse Folder
Read Attributes
Read Permissions
C:\WINDOWS\system32:
BUILTIN\Power Users Allowed: Special Permissions:
Delete
精彩评论