parsing an image in order to get the information out of it _问答_开发者

parsing an image in order to get the information out of it

开发者 https://www.devze.com 2023-03-06 03:13 出处：网络

Several days i mused about the three-folded job of a. getting b. parsing c. storing a number of pages.

相关专题：ocr perl

Several days i mused about the three-folded job of

a. getting b. parsing c. storing a number of pages.

Two days ago i thought that getting the pages would be the major-task. No this isnt the case - i guess that the parser-job would be a heroic task. Each of the pages that are intended to be parsed is a png-image.

So the question is - after getting all them. How to parse them!? This seems to be the issue. Guess that there are some perl-modules out there - that can help in doing this...

Well - i think that this job only can be done with some OCR embedded! Question: is there a perl-module that can be use here to support this task:

BTW: see the result-pages.

parsing an image in order to get the information out of it

BTW;: and as i thought i can find all 790 resultpages within a certain range between Id= 0 and Id= 100000 i thought, that i can go the way with a loop:

http://www.foundationfinder.ch/ShowDetails.php?Id=11233&InterfaceLanguage=&am开发者_StackOverflow社区p;Type=Html http://www.foundationfinder.ch/ShowDetails.php?Id=927&InterfaceLanguage=1&Type=Html http://www.foundationfinder.ch/ShowDetails.php?Id=949&InterfaceLanguage=1&Type=Html http://www.foundationfinder.ch/ShowDetails.php?Id=20011&InterfaceLanguage=1&Type=Html http://www.foundationfinder.ch/ShowDetails.php?Id=10579&InterfaceLanguage=1&Type=Html

i thought i can go the Perl-Way but i am not very very sure: I was trying to use LWP::UserAgent on the same URLs [see below] with different query arguments, and i am wondering if LWP::UserAgent provides a way for us to loop through the query arguments? I am not sure that LWP::UserAgent has a method for us to do that. Well - i sometimes heard that it is easier to use Mechanize. But is it really easier!?

But - to be frank; The first task " GETTING all the pages is not very difficult - if we compare this task with the parsing... How can this be done!?

Any ideas - suggestions -

look forward to hear from you...

zero

I would suggest using Image::OCR::Tesseract

I've had good experience with Tesseract in the past using C++.

See this for further info.