开发者

How to check if a file contains plain text?

开发者 https://www.devze.com 2022-12-23 03:33 出处:网络
I have a folder full of files and I want to search some string inside them. The issue is that some files may be zip, exe, ogg, etc.

I have a folder full of files and I want to search some string inside them. The issue is that some files may be zip, exe, ogg, etc. Can I check somehow what kind of file is it so I on开发者_运维知识库ly open and search through txt, PHP, etc. files. I can't rely on the file extension.


Use Python's mimetypes library:

import mimetypes
if mimetypes.guess_type('full path to document here')[0] == 'text/plain':
    # file is plaintext


You can use the Python interface to libmagic to identify file formats.

>>> import magic
>>> f = magic.Magic(mime=True)
>>> f.from_file('testdata/test.txt')
'text/plain'

For more examples, see the repo.


try something like this :

def is_binay_file(filepathname):
    textchars = bytearray([7,8,9,10,12,13,27]) + bytearray(range(0x20, 0x7f)) + bytearray(range(0x80, 0x100))
    is_binary_string = lambda bytes: bool(bytes.translate(None, textchars))

    if is_binary_string(open(filepathname, 'rb').read(1024)):
       return True
    else:
       return False

use the method like this :

is_binay_file('<your file path name>')

This will return True if file is of binary type and False if it is of text - it should be easy to convert this to reflect your needs, fx. make a function is_text_file - I leave that up to you


If you're on linux you can parse the output of the file command-line tool.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号