For a OCR engine I need to feed the OCR engine with TIFF files with CCITT4 Compression. Our scanner outputs TIFF files with JPEG compression. I want to convert these files with C#, using System.Drawing.Imaging.
This results in images with a lot of Noise. How can I reduce the noise?
My Code:
List<byte[]> fRet = new List<byte[]>();
ImageCodecInfo fImageCodecInfo = GetEncoderInfo("image/tiff");
EncoderParameters fEncoderParameters = new EncoderParameters(3);
f开发者_Python百科EncoderParameters.Param[0] = new EncoderParameter(System.Drawing.Imaging.Encoder.Compression, (long)EncoderValue.CompressionCCITT4);
fEncoderParameters.Param[1] = new EncoderParameter(System.Drawing.Imaging.Encoder.ScanMethod, (int)EncoderValue.ScanMethodNonInterlaced);
fEncoderParameters.Param[2] = new EncoderParameter(System.Drawing.Imaging.Encoder.RenderMethod, (int)EncoderValue.RenderNonProgressive);
//
Image fOrgTiff = Image.FromStream(pInputTiff);
Guid objGuid = fOrgTiff.FrameDimensionsList[0];
FrameDimension objDimension = new FrameDimension(objGuid);
int frameCount = fOrgTiff.GetFrameCount(objDimension);
for (int i = 0; i < frameCount; i++)
{
MemoryStream ms = new MemoryStream();
fOrgTiff.SelectActiveFrame(objDimension, i);
fOrgTiff.Save(ms, fImageCodecInfo, fEncoderParameters);
ms.Position = 0;
fRet.Add(ms.GetBuffer());
}
return fRet;
As Brannon said, Ccitt4 is a binary format (black/white) so your image is automatically binarized. The documentation says: "The Ccitt3, Ccitt4, and Rle require that the PixelFormat value be set to BlackWhite. Setting the PixelFormat to any other value resets the Compression property value to Default."
You can try to reduce the noise by choosing a better binarization threshold. You can look at algorithms provided by open-source imaging libraries like AForge.Net or EmguCV.
精彩评论