tesseract - Pytesseract not recognize text from image in Python

I am working with a Django application, there for some purpose i need to solve captcha i am already saving temporary captcha file but when i try to read captcha using pytesseract it return nothing empty string.

Already installed tesseract and tesseract-OCR.
Already Tries many times assuming that sometimes it don't work.

Already installed tesseract and tesseract-OCR.
Already Tries many times assuming that sometimes it don't work.

Share Improve this question edited Mar 13 at 13:37 furas 144k12 gold badges115 silver badges161 bronze badges asked Mar 13 at 6:05 Mohit Prajapat 707 bronze badges

Please include the code to the question. – Subir Chowdhury Commented Mar 13 at 6:12
1 @SubirChowdhury that is the code. For OP, just to confirm, have you followed all of the steps in the installation guide? github/madmaze/pytesseract?tab=readme-ov-file#installation – Max Commented Mar 13 at 6:42
tesseract sometimes may have problem when text is too small or too big. It may have other problems. See documentation: Improving the quality of the output | tessdoc – furas Commented Mar 13 at 12:01
1 real problem is not selenium but only tesseract - so I removed tags for selenium – furas Commented Mar 13 at 13:38

Add a comment |

1 Answer 1

Sorted by: Reset to default 2

tesseract sometimes may have problem when text is too small or too big. It may have other problems. See documentation: Improving the quality of the output | tessdoc

If I resize your image 200% then tesseract can get text.

I used external program ImageMagick for this but you may use python module pillow
(or Wand which also uses Imagemagick)

$ convert captcha.png -scale 200% captcha-200p.png

Command file can show some information about files

$ file ca*

captcha-200p.png: PNG image data, 300 x 60, 8-bit grayscale, non-interlaced
captcha.png:      PNG image data, 150 x 30, 8-bit/color RGBA, non-interlaced

Strange is that you don't get any error message because when I run tesseract only with input image then it shows message how to use it

$ tesseract captcha-200p.png

Usage:
  tesseract --help | --help-extra | --version
  tesseract --list-langs
  tesseract imagename outputbase [options...] [configfile...]

OCR options:
  -l LANG[+LANG]        Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.

Single options:
  --help                Show this help message.
  --help-extra          Show extra help for advanced users.
  --version             Show version information.
  --list-langs          List available languages for tesseract engine.

It needs output name without extension (and it adds .txt) to write result in file

$ tesseract captcha-200p.png output

Estimating resolution as 308

$ cat ouput.txt

81+20=?

or it needs - to set ouput to stdout and show it on screen or redirect to other program

$ tesseract captcha-200p.png -

Estimating resolution as 308
81+20=?

Tested on: Linux Mint 22 (based on Ubuntu 24.02), tesseract 5.3.4 (leptonica-1.82.0)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

tesseract - Pytesseract not recognize text from image in Python - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)