I would like to run a script on a folder full of word documents that reads through the documents and pulls out 开发者_开发知识库images and their captions (text right below the images). From the research I've done, I think pywin32 might be a viable solution. I know how to use pywin32 to find strings and pull them out, but I need help with the images part. How can I read through a docx file and have an event occur when an image is found? Thank you for any help! I am using Python 2.7.
Docx files can be unzipped for extracting the images.
Find some inspiration in this post How can I search a word in a Word 2007 .docx file?
You can use the python module docx2txt for extracting text as well as images from docx files
document =docx.Document(filepath)
for image in document.inline_shapes:
print (image.width, image.height)
Try this it will work.
精彩评论