Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this questionI have requirement to read a pdf file and search for a text. I should display in which page that text exist and the number of occurances. I can read the pdf to text but i need to know the page number.
Thanks
You can use Docotic.Pdf for this (I work for Bit Miracle).
Here is a sample for how to search text in PDF:
PdfDocument doc = new PdfDocument("file.pdf");
string textToSearch = "some text";
for (int i = 0; i < doc.Pages.Count; i++)
{
string pageText = doc.Pages[i].GetText();
int count = 0;
int lastStartIndex = pageText.IndexOf(textToSearch, 0, StringComparison.CurrentCultureIgnoreCase);
while (lastStartIndex != -1)
{
count++;
lastStartIndex = pageText.IndexOf(textToSearch, lastStartIndex + 1, StringComparison.CurrentCultureIgnoreCase);
}
if (count != 0)
Console.WriteLine("Page {0}: '{1}' found {2} times", i, textToSearch, count);
}
You may want to remove third argument for IndexOf
method if you want to perform case-sensitive search.
Have you checked itextsharp out? http://itextsharp.sourceforge.net/
EDIT: To elaborate, in the TOC, i saw a section on: 15.3.3: Extracting text with PdfReaderContentParser and PdfTextExtractor
And under PdfReaderContentParser: http://api.itextpdf.com/com/itextpdf/text/pdf/parser/PdfReaderContentParser.html there is an option to process the pdf content per page.
So it seems to be a round about way, but you can iterate through each page, searching the content for the word that you want and then return the page that you found it under.
精彩评论