开发者

need to search for social security number in thousands of documents (.doc,.docx,.pdf) in C#

开发者 https://www.devze.com 2023-02-01 23:35 出处:网络
Which is the best way to access the documents (opening and reading only text) so that searching is faster. I have already tried using Microsoft office word object to open and get the text by creating

Which is the best way to access the documents (opening and reading only text) so that searching is faster. I have already tried using Microsoft office word object to open and get the text by creating a word application and opening the files. I cant even go w开发者_开发问答ith threading because either i need to create only one word application which wont help me in threading and if i create word application in each thread the system cant handle it. How do you suggest me to go.

Thanks in advance


Ah... go back to reading the documentation of your operating system. FOr quite some time (i.e. many many years) there is an indexing and search system there that actually a lot of things can hook in (if you install the proper filters, downloadable from microsoft, adobe etc.).

This creates a full text index that then has an API to search. A LOT more efficient for repeatedly searching a large number of documents.

0

精彩评论

暂无评论...
验证码 换一张
取 消