I am doing some OCR stuff and screen scraping. I end up with lots of files that look like this.
All I need to do is some very basic OCR in C# on these files. I've been pulling my hair trying to get different libraries to work (Tessnet2, Puma, MODI) and have been having lots of different problems getting them to even run from within C#.
What do you guys r开发者_运维知识库ecommend for something this simple?
Thanks!
OCR programs are not designed to read low resolution screen shots. Even some of the best best commercial OCR engines have trouble reading screen shots.
Tesseract needs good clean images even under normal circumstances to get decent results. There could be a couple of reasons why you are getting poor results. If you post some sample images and the output results then we may be better able to explain the results. Problems include colored backgrounds, text zoning errors, small characters, artefacts ....
Apparently Tesseract will get much better results if you train it using the fonts that you want to read.
There's a web-based API for OCR that you can try, here's a C# example of how to use it: http://snipt.org/lOgh/ (you'll first need to register for an API key at http://www.wisetrend.com/wisetrend_ocr_cloud.shtml - look for the "Sign Up Free" button).
Disclaimer: WiseTrend is my company's customer.
精彩评论