I need to combine a folder full of pdfs into one file. However they must be combined in a certain order. A sample of the file names is:
WR_Mapbook__1.pdf
WR_Mapbook__1a.pdf 开发者_运维百科
WR_Mapbook__2.pdf
WR_Mapbook__2a.pdf
WR_Mapbook__3.pdf
WR_Mapbook__3a.pdf
etc...
The way that they are sorted in windows explorer is the way I need them to be added to the a single file. However my script adds all the "a" files first, and then the files without an "a". Why does it do that? How can I sort it so that the files are added in the way I want?
See the code below. Thanks!
from pyPdf import PdfFileWriter, PdfFileReader
import glob
outputLoc = "K:\\test\\pdf_output\\"
output = PdfFileWriter()
pdfList = glob.glob(r"K:\test\lidar_MB_ALL\*.pdf")
pdfList.sort
print pdfList
for pdf in pdfList:
print pdf
input1 = PdfFileReader(file(pdf, "rb"))
output.addPage(input1.getPage(0))
# finally, write "output" to document-output.pdf
outputStream = file(outputLoc + "WR_Imagery_LiDar_Mapbook.pdf", "wb")
output.write(outputStream)
print ("adding " + pdf)
outputStream.close()
What you need is to implement "Natural Order String Comparison". Hopefully someone has done this already and shared it.
EDIT: Here's a brute force example of doing this in Python.
import re
digits = re.compile(r'(\d+)')
def tokenize(filename):
return tuple(int(token) if match else token
for token, match in
((fragment, digits.search(fragment))
for fragment in digits.split(filename)))
# Now you can sort your PDF file names like so:
pdfList.sort(key=tokenize)
try putting () after pdfList.sort as in:
pdfList.sort()
The way you've got it written it won't actually sort the list. I grabbed your list of file names stuck them in an array and they sorted in the order you show them.
Replace pdfList.sort
by
pdfList = sorted(pdfList, key = lambda x: x[:-4])
or
pdfList = sorted(pdfList, key = lambda x: x.rsplit('.', 1)[0])
to ignore file extension while sorting
精彩评论