I am trying to parse a word document file. I upload the using PHP then I am trying to get contents using file_get_contents(); function but the problem is when its displayed in front end a lots of garbage code in there like
Æ�Ѐ¤d�¤d�[$\$gd®l±����„h¤d�¤d�[$\$^„hgd®l±���
&�F�¤d�¤d�[$\$g开发者_StackOverflow中文版d3¡���gd3¡����„,¤d�¤d�[$\$^„,gd(E����¤d�¤d�[$\$gdÿ/��<��C��D��I��Å������O��P��‚��¡��¢��¬����®��Ù��ã��ó��ô�����
So my question is how can I clean up this text?
Maybe give this a shot? http://www.phpclasses.org/package/3553-PHP-Edit-Microsoft-Word-documents-using-COM-objects.html
Word documents (like docx and doc) are not straight text files - they are actually proprietary file types that do not just have the text from byte 0 - this is how they have fancy formatting and fonts. .docx files are actually archives (.zip files) that contain a myriad of XML and styles.
Your best bet is to use a text input form, or find code online that allows you to extract just the text. Or, download the doc files to your own computer and use your own copy of MS word to open it.
精彩评论