Issue With Page Segment Mode

Hi Team,

I am using Tesseract 4.0.0.0 version in c#.

In which i am using the below code…

So i am processing a pdf scanned image instead of processing the whole image i am making the height of image  = img.height /3 in that case i getting the below exception in page1.GetIterator(),
I am not getting error in all the file just in fewer files and when i am not making the height /3 and passing the complete height it is working for all the images.

system.accessviolationexception: 'attempted to read or write protected memory. this is often an indication that other memory is corrupt.

Any help!!!

_engine = new TesseractEngine(
                            Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location) +
                            "\\tessdata", "eng", EngineMode.Default);

using (var ocr = Engine)
                {  
                    using (var page1 = ocr.Process(img1, PageSegMode.SparseText))
                    {
                        using (var iterator = page1.GetIterator())
                        {
                            iterator.Begin();
                            do
                            {
                                processWord = iterator.GetText(PageIteratorLevel.Word);
                                iterator.TryGetBoundingBox(PageIteratorLevel.Word, out Rect bounds);
                                OCRWords = new OCRObjects();
                                OCRWords.index = _index;
                                OCRWords.key = processWord;
                                OCRWords.bounds = bounds;
                                ocrObjects.Add(OCRWords);
                                _index++;
                            } while (iterator.Next(PageIteratorLevel.Word));
                        }
                    }
                } // End of OCR iteration. 

By modifying the image you are probably confusing tesseract.

Instead of modifying the image height I suggest you use the Process() method which takes a region. I.e. instead of

calling this:

using (var page1 = ocr.Process(img1, PageSegMode.SparseText))

try calling this:

Rect region = new Rect(0,0, img1.Width, img1.Height / 3);
using (var page1 = ocr.Process(img1, region, PageSegMode.SparseText))

with the un-modified image. This way tesseract should be scanning the 1/3 portion of the image which you desire.

Please follow-up to let us know how you made out. For good karma, mark a reply as the answer if it helped!

1 Like