开发者

Is it possible to read a pdf file as a txt?

开发者 https://www.devze.com 2022-12-29 01:56 出处:网络
I need to find a certain key in a pdf file. As far as I know the only way to do that is to interpret a pdf as txt file. I want to do this in PHP without installing a addon/framework/etc.

I need to find a certain key in a pdf file. As far as I know the only way to do that is to interpret a pdf as txt file. I want to do this in PHP without installing a addon/framework/etc.

Thanks开发者_开发技巧


You can certainly open a PDF file as text. PDF file format is actually a collection of objects. There is a header in the first line that tells you the version. You would then go to the bottom to find the offset to the start of the xref table that tells where all the objects are located. The contents of individual objects in the file, like graphics, are often binary and compressed. The 1.7 specification can be found here.


I found this function, hope it helps.

http://community.livejournal.com/php/295413.html


You can't just open the file as it is a binary dump of objects used to create the PDF display, including encoding, fonts, text, images. I wrote an blog post explaining how text is stored at http://pdf.jpedal.org/java-pdf-blog/bid/27187/Understanding-the-PDF-file-format-text-streams


Thank you all for your help. I owe you this piece of code:

// Proceed if file exists
if(file_exists($sourcePath)){
    $pdfFile = fopen($sourcePath,"rb");
    $data = fread($pdfFile, filesize($sourcePath));
    fclose($pdfFile);

    // Check if file is encrypted or not
    if(stripos($data,$searchFor)){ // $searchFor = "/Encrypt"
        $counterEncrypted++;    
    }else{
        $counterNotEncrpyted++;
    }
}else{
    $counterNotExisting++;
}
0

精彩评论

暂无评论...
验证码 换一张
取 消