最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

tesseract - Pytesseract not recognize text from image in Python - Stack Overflow

programmeradmin0浏览0评论

I am working with a Django application, there for some purpose i need to solve captcha i am already saving temporary captcha file but when i try to read captcha using pytesseract it return nothing empty string.

  • Already installed tesseract and tesseract-OCR.
  • Already Tries many times assuming that sometimes it don't work.

I am working with a Django application, there for some purpose i need to solve captcha i am already saving temporary captcha file but when i try to read captcha using pytesseract it return nothing empty string.

  • Already installed tesseract and tesseract-OCR.
  • Already Tries many times assuming that sometimes it don't work.
Share Improve this question edited Mar 13 at 13:37 furas 144k12 gold badges115 silver badges161 bronze badges asked Mar 13 at 6:05 Mohit PrajapatMohit Prajapat 707 bronze badges 4
  • Please include the code to the question. – Subir Chowdhury Commented Mar 13 at 6:12
  • 1 @SubirChowdhury that is the code. For OP, just to confirm, have you followed all of the steps in the installation guide? github/madmaze/pytesseract?tab=readme-ov-file#installation – Max Commented Mar 13 at 6:42
  • tesseract sometimes may have problem when text is too small or too big. It may have other problems. See documentation: Improving the quality of the output | tessdoc – furas Commented Mar 13 at 12:01
  • 1 real problem is not selenium but only tesseract - so I removed tags for selenium – furas Commented Mar 13 at 13:38
Add a comment  | 

1 Answer 1

Reset to default 2

tesseract sometimes may have problem when text is too small or too big. It may have other problems. See documentation: Improving the quality of the output | tessdoc


If I resize your image 200% then tesseract can get text.

I used external program ImageMagick for this but you may use python module pillow
(or Wand which also uses Imagemagick)

$ convert captcha.png -scale 200% captcha-200p.png

Command file can show some information about files

$ file ca*

captcha-200p.png: PNG image data, 300 x 60, 8-bit grayscale, non-interlaced
captcha.png:      PNG image data, 150 x 30, 8-bit/color RGBA, non-interlaced

Strange is that you don't get any error message because when I run tesseract only with input image then it shows message how to use it

$ tesseract captcha-200p.png

Usage:
  tesseract --help | --help-extra | --version
  tesseract --list-langs
  tesseract imagename outputbase [options...] [configfile...]

OCR options:
  -l LANG[+LANG]        Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.

Single options:
  --help                Show this help message.
  --help-extra          Show extra help for advanced users.
  --version             Show version information.
  --list-langs          List available languages for tesseract engine.

It needs output name without extension (and it adds .txt) to write result in file

$ tesseract captcha-200p.png output

Estimating resolution as 308

$ cat ouput.txt

81+20=?

or it needs - to set ouput to stdout and show it on screen or redirect to other program

$ tesseract captcha-200p.png -

Estimating resolution as 308
81+20=?

Tested on: Linux Mint 22 (based on Ubuntu 24.02), tesseract 5.3.4 (leptonica-1.82.0)

发布评论

评论列表(0)

  1. 暂无评论