in my program I should read Character by character from a pdf file and put evry word on a database. I doubted, can I do that or not? then I decided to convert the pdf file to a MS WORD file with a converter and then read from that file.
Now still I Don't know how can I read Character by character from a MS Word File. I'm using C++/MFC in my progr开发者_Python百科am.
if you give me an sample code it would very help me and I'll be so thanks-full.
Check out IFilter. http://msdn.microsoft.com/en-us/library/ms691105%28v=vs.85%29.aspx
Its a COM interface to extract text from files (each extension has its DLL that the COM returned according to what you need).
An example in C#: http://www.codeproject.com/KB/cs/IFilter.aspx, or http://www.codeproject.com/KB/string/pdf2text.aspx (I've used it in native c++, but I don't have code example...).
Notice that for PDF you might need to down PDF IFilter: http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611
Good Luck!
If you can convert the source file and you only need the characters, then make it a plain text file and read it using std::ifstream
.
To get more sofisticated information from an MS Word file, you should use Office Automation. There are good links in the answers to the following question:
Creating, opening and printing a word file from C++
精彩评论