javascript - Recognize letters using smartphone camera

I will develop a web app that can recognize text using the smartphone camera. I saw on the web that exist a lot solution that can recognize text in a picture/stream video, but all this solution need to develop a native app. I want to recognize text by creating a little site in which I can do the following stuff:

Register myself
Access to the smartphone camera
Recognize text in a shot
Show the letters recognized in a simply label
Save this letters in a remote database associated to my account

Anyone knows a way to recognize letters without taking a pic and without do a native app? On the web I found tesseract ocr, but I'm not sure that I can use in a HTML5, CSS3 and javascript page. Does anyone used this library? In which mobile browser it works (Safari for iOS, browser for Android and Internet Explorer for Windows Phone 7/8)?

Register myself
Access to the smartphone camera
Recognize text in a shot
Show the letters recognized in a simply label
Save this letters in a remote database associated to my account

Share Improve this question asked Mar 20, 2014 at 16:50 lucgian841 2,0025 gold badges37 silver badges60 bronze badges

Add a ment |

3 Answers 3

Sorted by: Reset to default 3

For the benefit of users just now ing across this question, there's a JS port of Tesseract (the library mentioned in the question itself) on GitHub: https://github./naptha/tesseract.js/ So to answer that part of the question, yes, actually, you can use Tesseract in your browser-side projects!

Also available (via the same developer!) are https://github./antimatter15/ocrad.js/ (mentioned in a previous answer) and https://github./antimatter15/gocr.js/ - either of these may work for your purposes, but something to keep in mind is that neither uses anything near as advanced as Tesseract does, in terms of ability to actually recognize text. So you'll be sacrificing a bit (or sometimes even a huge amount) of quality in exchange for smaller scripts.

This is a pretty tough problem. I've had a shot at it before and got it working on a pretty basic level. The real difficulty is making it versatile.

There's likely a library out there, so maybe that's your best approach. However, in the absence of a library here's what I think is the best approach (I'm only going to outline it).

1) You will need to in some sense take a picture. I'm fairly sure there are ways to get an ongoing input from the camera, but even then you can't send all of this back to your server so you probably want to at least take frames from this.

2) Letter recognition doesn't (necessarily) require colour. Device side, I would remend converting the image to black and white or even an array of integers with values representing the brightness at different points in the image. You would probably want to take limits of brightness relative to the overall difference in brightness of the image. What I mean is, find the brightest pixel and that's integer 100 and the darkest is integer 0, with all other numbers 1 to 99 representing different evenly spaced brightnesses between the max and min.

3) Now you've got a bit of a smaller image to send back and process so send that to your server!

4) Ok so now the tricky bit: we need to process that image. Firstly, we're going to need to separate out all the letters. Problem is that the letters and background could be any colour. There could easily be other objects in the image. We now need to work out what objects in the image are letters and where they are. The way I solved this was to look for the most major similar brightness ponents in the image. What I mean is, count the number of pixels between each different brightness threshold and it's pretty likely that the paper is the most major contribution with the letters being the second. Not definitely, just probably.

5) Go through the image and extract each object. You can do this by going to each pixel, if it's the colour that your code thinks the letters are, checking all neighbouring pixels, and then all their neighbouring pixels, until you find no more bordering pixels of a similar colour. This is a letter.

6) So by this point we have an array of numbers representing the original image and some idea of what may or may not be the letters from how much of the image is covered by objects of the same brightness. Next we're going to cover actual identification of objects, but I'd remend using a similar technique to the one that follows to ensure that what your code thinks are the letters actually are the letters. Essentially, you want to take a few objects from each of the sets of objects that are more likely to be letters, then try your actual individual letter identification algorithm on these letters. The set that is letters can be determined because the algorithm will (should) output that an object is much more likely to be a particular letter than any other.

Another check you can do is the size of objects in each set. Letters should all be a pretty uniform size, or a couple of pretty uniform sizes.

7) Right, so we have a set of objects (hopefully with some coordinates attached such that you don't lose track of where in the image these objects came from) that are probably letters. How do we recognize them? There's two main ways to do "Optical Character Recognition". They are: Matrix Matching and Feature Extraction. Feature Extraction involves looking for loops and lines and other features of the letters. This is very hard to program, so we'll still with Matrix Matching.

Take each object in turn and pare it to an object representing each letter in the alphabet. You should try to align the two images and stretch/shrink to fit the two images (think: if the camera were on a slant, the objects wouldn't match too well. If the camera is closer, the objects are bigger) and then minus all the pixels in one from all the pixels in another. The one with the least value left over is likely to be the correct letter. Likely. This technique falls down if you have to deal with dramatically different fonts. You could pare a whole load of fonts, but this would take a lot of puting power.

You can also do some fancy eigenvector analysis for image recognition, but I'm not entirely sure that this is appropriate in this case.

Now take all the most-likely-letters and use the coordinates from their associated object to reconstruct the text.

In short, this is very tricky. You're probably best off using a library someone has built, but even then it won't be accurate a lot of the time.

Sorry if this isn't quite the answer you wanted. Thanks if you read this far. I just find it a very interesting problem.

Have you tried this? It's a Javascript library

OCRAD

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - Recognize letters using smartphone camera - Stack Overflow

3 Answers 3

与本文相关的文章

评论列表(0)