I am working with a Django application, there for some purpose i need to solve captcha i am already saving temporary captcha file but when i try to read captcha using pytesseract it return nothing empty string.
- Already installed tesseract and tesseract-OCR.
- Already Tries many times assuming that sometimes it don't work.
I am working with a Django application, there for some purpose i need to solve captcha i am already saving temporary captcha file but when i try to read captcha using pytesseract it return nothing empty string.
- Already installed tesseract and tesseract-OCR.
- Already Tries many times assuming that sometimes it don't work.
1 Answer
Reset to default 2tesseract
sometimes may have problem when text is too small or too big. It may have other problems. See documentation: Improving the quality of the output | tessdoc
If I resize your image 200%
then tesseract can get text.
I used external program ImageMagick for this but you may use python module pillow
(or Wand which also uses Imagemagick
)
$ convert captcha.png -scale 200% captcha-200p.png
Command file
can show some information about files
$ file ca*
captcha-200p.png: PNG image data, 300 x 60, 8-bit grayscale, non-interlaced
captcha.png: PNG image data, 150 x 30, 8-bit/color RGBA, non-interlaced
Strange is that you don't get any error message because when I run tesseract only with input image then it shows message how to use it
$ tesseract captcha-200p.png
Usage:
tesseract --help | --help-extra | --version
tesseract --list-langs
tesseract imagename outputbase [options...] [configfile...]
OCR options:
-l LANG[+LANG] Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.
Single options:
--help Show this help message.
--help-extra Show extra help for advanced users.
--version Show version information.
--list-langs List available languages for tesseract engine.
It needs output name without extension (and it adds .txt
) to write result in file
$ tesseract captcha-200p.png output
Estimating resolution as 308
$ cat ouput.txt
81+20=?
or it needs -
to set ouput to stdout and show it on screen or redirect to other program
$ tesseract captcha-200p.png -
Estimating resolution as 308
81+20=?
Tested on: Linux Mint 22 (based on Ubuntu 24.02), tesseract 5.3.4 (leptonica-1.82.0)
tesseract
sometimes may have problem when text is too small or too big. It may have other problems. See documentation: Improving the quality of the output | tessdoc – furas Commented Mar 13 at 12:01selenium
– furas Commented Mar 13 at 13:38