Using the javaocr framework from sourceforge. Trying to scan letters from a image, and training the system to recognize them.
Getting this exception when loading trainer:
java.io.IOException: Expected to decode 26 characters but actually decoded 33 characters in training: /Developer/MAckan/bin/LETTERS/trainLetters.PNG
at net.sourceforge.javaocr.ocrPlugins.mse开发者_Go百科OCR.TrainingImageLoader.load(TrainingImageLoader.java:111)
My code is like this:
loader.load(this,ClassLoader.getSystemResource("LETTERS/trainLetters.PNG").getPath(), new CharacterRange('A', 'Z'), images);
Another question is how to get it to train Scandinavian letters. If I enter a range A-Ö it expects 150 characters.
Then when I scan I try and scan a line in the image at the time:
scanner.addTrainingImages(images);
final CharacterRange[] cr = new CharacterRange[1];
cr[0] = new CharacterRange('A', 'Z');
// get the first line of letters
final int x1 = 0;
final int y1 = 130;
final int x2 = 640;
final int y2 = 170;
for (int i = 0; i < 15; i++) {
final String text = scanner.scan(boardImage, x1, y1 + (i * 40), x2,
y2 + (i * 40), cr);
System.out.println("scanned " + text);
}
And I actually get output, but not the output I expect... Anyone have experience with the javaocr framework?
Update: Solved the training issue. The training image was missing a couple of charachters and Scandinavian is not supported (?). Still getting strange output.
Update2: Solved the entire issue with writing my own comparison instead. I did some manipulation of the images (reduced colors and transperency) and compared pixel by pixel and returned a diff against alafabet images. The lowest diff "wins". Works for this particular case, but I am still interested in getting OCR running.
Thanks.
/A
Well, you won't like my answer but here it is: Javaocr is kind of crappy and very poorly documented. I've tried some of the code from the demo src but on other PNG files than those supplied and it doesn't really recognize all that much.
Here's a library that actually worked: http://asprise.com/product/ocr/download.php?lang=java. It's not free however, well if you look at the license prices it's REALLY not free, but there you go.
Option 2 would be to try out Google's brand new online ocr service: http://googlesystem.blogspot.com/2009/09/google-docs-ocr.html. I haven't tried it myself, but you should get at least better support than with Java ocr...
Solved the entire issue with writing my own comparison instead. I did some manipulation of the images (reduced colors and transperency) and compared pixel by pixel and returned a diff against alafabet images. The lowest diff "wins". Works for this particular case, but I am still interested in getting OCR running.
Thanks everyone for contributing.
/A
精彩评论