开发者

Regular expressions in a Python find-and-replace script? Update

开发者 https://www.devze.com 2023-01-03 04:25 出处:网络
I\'m new to Python scripting, so please forgive me in advance if the answer to this question seems inherently obvious.

I'm new to Python scripting, so please forgive me in advance if the answer to this question seems inherently obvious.

I'm trying to put together a large-scale find-and-replace script using Python. I'm using code similar to the following:

infile = sys.argv[1]
charenc = sys.argv[2]
outFile=infile+'.output'

findreplace = [
('term1', 'term2'),
]

inF = open(infile,'rb')
s=unicode(inF.read(),charenc)
inF.close()

for couple in findreplace:
    outtext=s.replace(couple[0],couple[1])
    s=outtext

outF = open(outFile,'wb')
outF.write(outtext.encode('utf-8'))
outF.close()

How would I go about having the script do a find and replace for regular expressions?

Specifically, I want it to find some information (metadata) specified at the top of a text file. Eg:

Title: This is the title
Author: This is the author
Date: This is the date

and convert it int开发者_运维问答o LaTeX format. Eg:

\title{This is the title}
\author{This is the author}
\date{This is the date}

Maybe I'm tackling this the wrong way. If there's a better way than regular expressions please let me know!

Thanks!

Update: Thanks for posting some example code in your answers! I can get it to work so long as I replace the findreplace action, but I can't get both to work. The problem now is I can't integrate it properly into the code I've got. How would I go about having the script do multiple actions on 'outtext' in the below snippet?

for couple in findreplace:
    outtext=s.replace(couple[0],couple[1])
    s=outtext


>>> import re
>>> s = """Title: This is the title
... Author: This is the author
... Date: This is the date"""
>>> p = re.compile(r'^(\w+):\s*(.+)$', re.M)
>>> print p.sub(r'\\\1{\2}', s)
\Title{This is the title}
\Author{This is the author}
\Date{This is the date}

To change the case, use a function as replace parameter:

def repl_cb(m):
    return "\\%s{%s}" %(m.group(1).lower(), m.group(2))

p = re.compile(r'^(\w+):\s*(.+)$', re.M)
print p.sub(repl_cb, s)

\title{This is the title}
\author{This is the author}
\date{This is the date}


See re.sub()


The regular expression you want would probably be along the lines of this one:

^([^:]+): (.*)

and the replacement expression would be

\\\1{\2}


>>> import re
>>> m = 'title', 'author', 'date'
>>> s = """Title: This is the title
Author: This is the author
Date: This is the date"""
>>> for i in m:
    s = re.compile(i+': (.*)', re.I).sub(r'\\' + i + r'{\1}', s)


>>> print(s)
\title{This is the title}
\author{This is the author}
\date{This is the date}
0

精彩评论

暂无评论...
验证码 换一张
取 消