开发者

unable to convert pdf to text using python script

开发者 https://www.devze.com 2023-01-20 04:46 出处:网络
i want to convert all my .pdf files from a specific directory to .txt f开发者_运维百科ormat using the command pdftotext... but i wanna do this using a python script...

i want to convert all my .pdf files from a specific directory to .txt f开发者_运维百科ormat using the command pdftotext... but i wanna do this using a python script... my script contains:

import glob 
import os

fullPath = os.path.abspath("/home/eth1/Downloads")

for fileName in glob.glob(os.path.join(fullPath,'*.pdf')):
   fullFileName = os.path.join(fullPath, fileName)
   os.popen('pdftotext fullFileName')

but I am getting the following error:

Error: Couldn't open file 'fullFileName': No such file or directory.


You are passing fullFileName literally to os.popen. You should do something like this instead (assuming that fullFileName does not have to be escaped):

os.popen('pdftotext %s' % fullFileName)

Also note that os.popen is considered deprecated, it's better to use the subprocess module instead:

import subprocess
retcode = subprocess.call(["/usr/bin/pdftotext", fullFileName])

It is also much safer as it handles spaces and special characters in fullFileName properly.


Change the last line to

os.open('pdftotext {0}'.format(fullFileName))

This way the value of fullFileName will be passed, instead of the name.

0

精彩评论

暂无评论...
验证码 换一张
取 消