How to identify the exact file type of a file? For better understanding I am giving some more details:
For example if I have a file named as "example.exe" then I can easily recognize that it's a windows executable file (by seeing the extension .exe). But if I remove the extension (.exe) then by seeing I am not able to identify the type of the file.
The开发者_运维技巧n now how can I identify the file type?
(Please suggest your answers using c/c++, java, python or php (For web uploads))
Thanks
There's no such thing as "exact file type". Binary data is binary data.
If you're running on a POSIX-like system, you can use the file
command to guess the file type. I don't think this gives you a MIME type.
If your server is running Apache, then you can use mod_mime_magic to make a guess.
If you're using PHP, you can install the fileinfo extension.
You need to know the specification of each file type you want to handle.
With this specification you can create a method to check if a given file is of a specific type.
Example:
isExe(File)
isJpg(File)
If you want to find a file extention, try to use this trivial code:
$ext = pathinfo($filename, PATHINFO_EXTENSION);
In the case of Python: The Python magic library provides the functionality you need.
You can install the library with
pip install python-magic
and use it as follows:
>>> import magic
>>> magic.from_file('sampleone.jpg')
'JPEG image data, JFIF standard 1.01'
>>> magic.from_file('sampletwo.png')
'PNG image data, 600 x 1000, 8-bit colormap, non-interlaced'
We cannot recognize type of file just from the extension. One can easily change extension of file from .text to .exe, which doesn't means that file is valid executable.
Lets assume we are on windows platform:
Portable-Executable [PE] is native Win32 file format. Every executable uses PE file format except VxDs and 16-bit dll's. 32-bit dll's, exe's,COM files,OCX control,CPL files,.NET executables, NT's kernal mode drivers are all PE format. Now Moving further PE format have its predefined structure it consist of different headers, section headers, section data etc. which contains information about address,size and executable code.
Headers contains some signature fileds:
e.g executables will always have MZ(0x5A4D) value in DOS header and PE(0x4550) value in PE header.
From above values we can distinguish as executables and non-executables.
Now moving towards non-executable:
Consider .jpg file : we use different tools to generate .jpg file. While creating a .jpg file this tools adds signature(something like 0xd8ff) in header file and binary data about image in data section. while opening .jpg file software reads signature in header field and if valid signature found it draws image based on binary data in section.
Similarly, .pdf,.mp3,... files will have unique signatures.
.text files will not have any signature. Data will be available from first offset of text file.
The header information can be viewed by following way:
CreateFile(...)//ReadMode
CreateFileMapping(...)
MapViewOfFile(...)
Once file view is mapped header information can be retrived using below structures defined in winnt.h
IMAGE_DOS_HEADER
IMAGE_NT_HEADER
Signature should be matched against e_magic field of IMAGE_DOS_HEADER and if it is MZ(0x5A4D) then again match with Signature field of IMAGE_NT_HEADER.
精彩评论