开发者

PDF Text search C# [closed]

开发者 https://www.devze.com 2023-02-08 05:46 出处:网络
Closed. This question needs to be more focused. It is not currently accepting answers. 开发者_如何学JAVA
Closed. This question needs to be more focused. It is not currently accepting answers. 开发者_如何学JAVA

Want to improve this question? Update the question so it focuses on one problem only by editing this post.

Closed 5 years ago.

Improve this question

I have requirement to read a pdf file and search for a text. I should display in which page that text exist and the number of occurances. I can read the pdf to text but i need to know the page number.

Thanks


You can use Docotic.Pdf for this (I work for Bit Miracle).

Here is a sample for how to search text in PDF:

PdfDocument doc = new PdfDocument("file.pdf");
string textToSearch = "some text";
for (int i = 0; i < doc.Pages.Count; i++)
{
    string pageText = doc.Pages[i].GetText();
    int count = 0;
    int lastStartIndex = pageText.IndexOf(textToSearch, 0, StringComparison.CurrentCultureIgnoreCase);
    while (lastStartIndex != -1)
    {
        count++;
        lastStartIndex = pageText.IndexOf(textToSearch, lastStartIndex + 1, StringComparison.CurrentCultureIgnoreCase);
    }

    if (count != 0)
        Console.WriteLine("Page {0}: '{1}' found {2} times", i, textToSearch, count);
}

You may want to remove third argument for IndexOf method if you want to perform case-sensitive search.


Have you checked itextsharp out? http://itextsharp.sourceforge.net/

EDIT: To elaborate, in the TOC, i saw a section on: 15.3.3: Extracting text with PdfReaderContentParser and PdfTextExtractor

And under PdfReaderContentParser: http://api.itextpdf.com/com/itextpdf/text/pdf/parser/PdfReaderContentParser.html there is an option to process the pdf content per page.

So it seems to be a round about way, but you can iterate through each page, searching the content for the word that you want and then return the page that you found it under.

0

精彩评论

暂无评论...
验证码 换一张
取 消