Classifying type samples from image files_问答_开发者

Classifying type samples from image files

开发者 https://www.devze.com 2022-12-20 08:59 出处：网络

Which approach would you suggest for automatically classifying type found in images? The samples are likely large, with black text on a white background.

The categories are defined here, with some examples on each (Google Books link): http://bit.ly/9Mnu7P This is an extended version of the VOX-ATypI classification system.

My initial thoughts on this were to train the system with lots of single character samples from each category, but I'm wondering i开发者_StackOverflowf there's a better way that would eliminate the need to do the comparison one letter at a time.

First, you need to extract features for classification. Typefaces are generally distinguished by the thickness of lines, the presence of serifs, "circularity" of character parts. Thus, the possible features are:

The fraction of the number of black pixels on the fixed area.
Try to apply math morphology erosion few times (and/or use different masks) and compute this fraction
Compute the mean compactness of a character: perimeter^2 / area
After applying erosion, count the number of connected components for a character
Compute the elongation and other image moments, also the direction
etc

I see two options here: either compute mean features for all characters, or try to classify letters first, and than classify the font based on some specific letters (so, you train the different classifier for a different letter). It's hard to say which one is better in your case.

As for specific learning algorithm, Random Forest seems to be a good place to start. There's an implementation in the OpenCV library.