Page 1 of 1

Free C# OCR library

Posted: Mon Oct 22, 2018 4:10 am
by sheafox
Does anyone know a good free C# OCR library ?

Re: Free C# OCR library

Posted: Tue Oct 23, 2018 11:16 am
by odklizec
Hi,

I don't have a use for OCR library, but a quick google search returned this:
http://www.pixel-technology.com/freeware/tessnet2/

Re: Free C# OCR library

Posted: Thu Dec 13, 2018 10:12 am
by semate
Hi!

I have the Tesseract OCR Library running with Ranorex.
I ended up using the Package below:
Tesseract2.PNG

Make sure to have the libs in the Ranorex Project.
TesseractLibs.PNG
My code looks like that:

Code: Select all

        //---------------------------------------------------------------------
    	/// <summary>
    	/// Read graphical Text with the Tesseract OCR module
    	/// </summary>
    	[UserCodeMethod]
    	public static string OCRRead(Bitmap bmp, string whitelist,string enginePath)
    	{
			try{    		
	    		Tesseract.Pix px = PixConverter.ToPix(bmp);	    		
	    		TesseractEngine engine = new TesseractEngine(enginePath, "eng", Tesseract.EngineMode.Default);
	    		engine.DefaultPageSegMode=Tesseract.PageSegMode.Auto;
	    		//engine.SetVariable("classify_bln_numeric_mode",0);
	    		if (whitelist!="")
	    		{
	    			engine.SetVariable("tessedit_char_whitelist",whitelist);    		
	    		}
	    		Tesseract.Page pg = engine.Process(px);    		
	    		string text = pg.GetText();				
				return text;
    		} catch(Exception ex) {
    			Debug.WriteLine("EnginePath: "+enginePath);
    			Debug.WriteLine("Whitelist: "+whitelist);
    			throw new ExceptionOcrImage(ex.ToString(),bmp);
    		}	
    	}
And an example call:

Code: Select all

Bitmap bmp ;   // bitmap, e.g. from screenshot
string whitelist = "0123456789:._-/| ";	    	
string tesseractFile=@"D:\tesseract\DataFiles\tessdata";
string ocrDatetime = OCRRead(bmp, whitelist, tesseractFile);
Make sure you have the Trainingsdata File available in the tessdata folder. If I remember right, the tessdata folder was mandatory.
I downloaded the files eng.traineddata and deu.traineddata from https://github.com/tesseract-ocr/tessdata. Make sure you use the correct version (3.0.4 in my case)

As for the accuracy of the text detection I do have to say that it works best with large texts. Small texts may be challenging and some characters and spaces are not always detected perfectly. Even if I filter all colors to have only white text on black background. But that may be different from case to case. And there should be the possibility to train it yourself - but I haven't looked into that yet.

Hope that helps!

Re: Free C# OCR library

Posted: Thu Dec 13, 2018 10:14 am
by semate
It messed up the pictures in my earlier post.

Libs picture should be:
TesseractLibs.PNG