OCR

OCR_文字识别

OCR(Optical Character Recognition)，是指通过光学技术对字符进行识别。1929年，德国科学家Tausheck首先提出了OCR，并为此申请了专利。这种技术直到计算机诞生之后变成了现实，通过光学技术对字符进行扫描、识别，最后转化为计算机的内编码。

文章目录

OCR_文字识别
一、随机生成模拟数据集
二、多尺度输出网络
三、训练过程
四、深入思考
五、源码
六、相关链接

一、随机生成模拟数据集

常用的验证码生成库有captcha和gvcode。captcha库支持文字验证码和语音验证码的生成，如果要增加验证码的复杂度，可以向captcha库中添加指定的字体。

使用captcha库生成验证码共分为两步：
（1）实例化captcha模块的ImageCaptcha类，并指定验证码的尺寸和字体。

ImageCaptcha(width=160, height=60, fonts=None, font_sizes=None) 实例化时，可传入4个参数:
1、width: 生成验证码图片的宽度，默认为160个像素；
2、height：生成验证码图片的高度，默认为60个像素；
3、 fonts：字体文件路径，用于生成验证码时的字体，默认使用模块自带DroidSansMono字体，也可以将字体文件传入，生成验证码时将随机使用;
4、font_sizes：控制验证码字体大小，同fonts一样，接收一个list或者tuple，随机使用。

（2）调用ImageCaptcha类对象的generate_image方法，传入字符串即可生成验证码。

generate_image()方法接收一个字符串参数，将生成次字符串内容的验证码，返回的是PIL模块中的Image对象。

随机模拟生成10000张验证码图片，每张图片(80, 210)大小，含有6个字符，总类别数10，包含数字0—9。如下图所示：

输入到网络中的训练数据的结构：
train_x：(batch, 80, 210, 3)。
train_y：[array1, array2, array3, array4, array5, array6]，每个array都是(batch, n_class)大小。

二、多尺度输出网络

卷积神经网络模型的结构并不是唯一的，但目的往往只有一个，降维，用更低维度的数据最大限度地表示出原始像素的含义。多分类输出层的作用是将验证码输出的每个字符都当作一个分类任务去执行，由于此时字符数量是6，因此需要对卷积网络提取到的特征数据进行6次分类，每次的分类结果对应预测一个字符。

（1）模型输入：Input((80, 210, 3))。

（2）第一轮卷积：shape->(None, 38, 103, 32)

Conv2D(32, (3, 1), activation=‘relu’)
BatchNormalization()
Conv2D(32, (1, 3), activation=‘relu’)
BatchNormalization()

Conv2D(32, 2, 2, activation=‘relu’)
BatchNormalization()

（2）第二轮卷积：shape->(None, 17, 49, 64)

Conv2D(64, (3, 1), activation=‘relu’)
BatchNormalization()
Conv2D(64, (1, 3), activation=‘relu’)
BatchNormalization()

Conv2D(64, 2, 2, activation=‘relu’)
BatchNormalization()

（3）第三轮卷积：shape->(None, 6, 22, 128)

Conv2D(128, (3, 1), activation=‘relu’)
BatchNormalization()
Conv2D(128, (1, 3), activation=‘relu’)
BatchNormalization()

Conv2D(128, 2, 2, activation=‘relu’)
BatchNormalization()

（4）第四轮卷积：shape->(None, 1, 9, 256)

Conv2D(256, (3, 1), activation=‘relu’)
BatchNormalization()
Conv2D(256, (1, 3), activation=‘relu’)
BatchNormalization()

Conv2D(256, 2, 2, activation=‘relu’)
BatchNormalization()

（5）多尺度输出：shape->6个(None, 36)

分别进行6次重复卷积操作，生成6个分支输出：
out_branch = Conv2D(n_class, (1, 9))(x)
out_branch = Reshape((n_class,))(out_branch)
out_branch = Activation(‘softmax’)(out_branch)

三、训练过程

训练集8000张图片，验证集1000张图片，测试集1000张图片。

训练了10个epoch，优化器：adam = Adam(lr=1e-3, amsgrad=True)。验证集val loss在1.98附近达到瓶颈，后续无法继续降低。

用测试集检测：对于单个字符，识别精度可达到91.4%；对于整个序列，识别精度可达64.9%。

四、深入思考

Ques1：输出层的6个分支结构一样、相互独立，但它们之间为什么会形成序列关系？

模型在经过卷积网络处理之后得到了尺寸为(1, 9)的特征数据，然后使用6个相同的网络结构对图片中的每个字符进行分类。该模型能够按照顺序识别出验证码，这就说明输出层的6个分支之间具有序列关系。为什么会有这种效果呢？

本质上，输出层中的6个分支之间并没有序列关系，它们能够识别出图片中不同位置的字符，是与输出层的权重有关的。在训练时，每个输出层的分支都可以被视为独立的分类网络，该网络根据样本中的标签调整权重，最终使得该输出层中的分支能够识别出样本中的指定标签。训练多个独立分支时，我们令输入标签是有顺序的，这样输出的结果也就有顺序了。

本质上，输出层中的每个分类器中都会执行一个卷积运算，该卷积层的作用就是，从完整的图像信息中找出指定区域的部分，然后提高此处特征的权重占比，再进行分类。

所以，如果是固定字符长度、固定尺寸的OCR文字识别问题，其实很好处理，直接利用卷积网络处理多次即可。因此对于车牌号识别问题，传统的处理思路陷入了误区，对检测出来的车牌号图片，没必要先分割成不同小块，然后再对每个进行分类；其实直接进行分类即可，借助模型强大的权重设置和拟合能力，足以自动替我们完成分割的功能。

Ques2：Grad-CAM，热力图实现深度网络可视化。

Grad-CAM方法是乔治亚理工学院等研究单位在2017年提出的一种基于梯度定位的深度网络可视化方法，Grad-CAM以热力图形式解释深度神经网络模型的分类依据，在原始图片中标注出卷积模型分类过程中关注的关键像素。

Grad-CAM的基本原理是，计算最后一个卷积层中的每个特征图对于每个类别的权重，然后对每个特征图求加权和，最后以热力图的形式，把加权后的特征图映射到原始图片尺寸。

输入如下图片：

6个分支的分类网络，权重系数的热力图依次为：

详细情况可以参考论文：‘’Grad-CAM：Visual Explanations from Deep Networks via Gradient-based Localization‘’。

五、源码

主函数：

import numpy as np
import cv2
from get_data import make_data
from train import SequenceData
from train import train_network
from train import load_network_then_train
from predict import predict_sequenceif __name__ == "__main__":train_x, train_y, val_x, val_y, test_x, test_y = make_data()train_generator = SequenceData(train_x, train_y, 32)val_generator = SequenceData(val_x, val_y, 32)# train_network(train_generator, val_generator, epoch=50)# load_network_then_train(train_generator, val_generator, epoch=30,#                         input_name='/home/archer/8_XFD_CODE/OCR2/Logs/epoch008-loss0.123-val_loss4.745.h5',#                         output_name='second_weights.hdf5')predict_sequence(test_x, test_y)

读取数据集：

import numpy as np
import cv2
import osdef read_path():data_x = []data_y = []filename = os.listdir('img')filename.sort()for name in filename:img_path = 'img/' + namedata_x.append(img_path)obj1 = name.split('.')obj2 = obj1[0].split('_')obj3 = obj2[1]data_y.append(obj3)return data_x, data_ydef make_data():data_x, data_y = read_path()print('all image quantity : ', len(data_y))    # 10000train_x = data_x[:8000]train_y = data_y[:8000]val_x = data_x[8000:9000]val_y = data_y[8000:9000]test_x = data_x[9000:]test_y = data_y[9000:]return train_x, train_y, val_x, val_y, test_x, test_y

网络结构：

from tensorflow.keras.models import *
from tensorflow.keras.layers import *
import string# 总字符类别数： 10种数字
def get_model():char_class = string.digitswidth, height, n_len, n_class = 210, 80, 6, len(char_class)input_tensor = Input((height, width, 3))x = input_tensorfor i in range(4):for j in range(2):# 实现两个valid类型的卷积运算x = Conv2D(32 * 2 ** i, (3, 1), activation='relu')(x)x = BatchNormalization()(x)x = Conv2D(32 * 2 ** i, (1, 3), activation='relu')(x)x = BatchNormalization()(x)# 下采样x = Conv2D(32 * 2 ** i, 2, 2, activation='relu')(x)x = BatchNormalization()(x)# 此时输出结构： (None, 1, 9, 256)out = []# 6个输出尺度，相互独立for i in range(n_len):out_branch = Conv2D(n_class, (1, 9))(x)out_branch = Reshape((n_class,))(out_branch)out_branch = Activation('softmax', name='c%d' % (i + 1))(out_branch)out.append(out_branch)model = Model(inputs=input_tensor, outputs=out)model.summary()return model

训练：

import cv2
import os
import random
import numpy as np
import string
from tensorflow.keras.utils import *
import math
from ocr_model import get_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpointchar_class = string.digits
width, height, n_len, n_class = 210, 80, 6, len(char_class)
char_list = list(char_class)
# ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']class SequenceData(Sequence):def __init__(self, data_x, data_y, batch_size):self.batch_size = batch_sizeself.data_x = data_xself.data_y = data_yself.indexes = np.arange(len(self.data_x))def __len__(self):return math.floor(len(self.data_x) / float(self.batch_size))def on_epoch_end(self):np.random.shuffle(self.indexes)def __getitem__(self, idx):batch_index = self.indexes[idx * self.batch_size:(idx + 1) * self.batch_size]batch_x = [self.data_x[k] for k in batch_index]batch_y = [self.data_y[k] for k in batch_index]x = np.zeros((self.batch_size, height, width, 3))y = [np.zeros((self.batch_size, n_class)) for k in range(n_len)]    # n_len 个 (batch_size, n_class)for i in range(self.batch_size):img = cv2.imread(batch_x[i])img1 = cv2.resize(img, (width, height), interpolation=cv2.INTER_AREA)img2 = img1 / 255x[i, :, :, :] = img2for j in range(n_len):char = batch_y[i][j]char_index = char_class.find(char)y[j][i, char_index] = 1return x, y# create model and train and save
def train_network(train_generator, validation_generator, epoch):model = get_model()adam = Adam(lr=1e-3, amsgrad=True)log_dir = "Logs/"checkpoint = ModelCheckpoint(log_dir + 'epoch{epoch:03d}-train_loss{loss:.3f}-val_loss{val_loss:.3f}.h5',monitor='val_loss', save_weights_only=True, save_best_only=False, period=1)model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])model.fit_generator(train_generator,steps_per_epoch=len(train_generator),epochs=epoch,validation_data=validation_generator,validation_steps=len(validation_generator),callbacks=[checkpoint])model.save_weights('first_weights.hdf5')def load_network_then_train(train_generator, validation_generator, epoch, input_name, output_name):model = get_model()model.load_weights(input_name)print('网络层总数为：', len(model.layers))  # 175adam = Adam(lr=1e-4, amsgrad=True)log_dir = "Logs/"checkpoint = ModelCheckpoint(log_dir + 'epoch{epoch:03d}-train_loss{loss:.3f}-val_loss{val_loss:.3f}.h5',monitor='val_loss', save_weights_only=True, save_best_only=False, period=1)model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])model.fit_generator(train_generator,steps_per_epoch=len(train_generator),epochs=epoch,validation_data=validation_generator,validation_steps=len(validation_generator),callbacks=[checkpoint])model.save_weights(output_name)

预测效果：

import cv2
from ocr_model import get_model
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
import numpy as np
import string
from get_data import make_datachar_class = string.digits
width, height, n_len, n_class = 210, 80, 6, len(char_class)
char_list = list(char_class)
# ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']def predict_sequence(test_x, test_y):predict_model = get_model()predict_model.load_weights('best_val_loss1.982.h5')acc_count = 0     # 统计正确的序列个数char_count = 0    # 统计正确的字符个数for i in range(len(test_x)):img = cv2.imread(test_x[i])img1 = cv2.resize(img, (width, height), interpolation=cv2.INTER_AREA)img2 = img1 / 255img3 = img2[np.newaxis, :, :, :]y_pre = predict_model.predict(img3)y_pre1 = np.array(y_pre)y_pre2 = ''for j in range(n_len):vector = y_pre1[j, 0, :]index = int(np.argmax(vector))char = char_list[index]if char == test_y[i][j]:char_count = char_count + 1y_pre2 = y_pre2 + charif y_pre2 == test_y[i]:acc_count = acc_count + 1# print(y_pre2)# cv2.namedWindow("img2")# cv2.imshow("img2", img2)# cv2.waitKey(0)print('sequence recognition accuracy : ', acc_count / len(test_x))print('char recognition accuracy : ', char_count / (len(test_x) * n_len))

热力图深度网络可视化：

import cv2
from ocr_model import get_model
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
import numpy as np
import stringchar_class = string.digits
width, height, n_len, n_class = 210, 80, 6, len(char_class)
char_list = list(char_class)
# ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']# 获取特定名称的网络层
def get_output_layer(model1, layer_name):layer_dict = dict([(layer.name, layer) for layer in model1.layers])layer = layer_dict[layer_name]return layerdef visualize_region(new_model, img, conv_name, k):aug_img = img[np.newaxis, :, :, :]y_pre = new_model.predict(aug_img)add_result = y_pre[-1]    # (1, 1, 9, 256)original_result = y_pre[:-1]    # 6 个 (1, 36)# 取出输出层某个分类器分支，并将其权重提取出来weights_layer = get_output_layer(new_model, conv_name)class_weights = weights_layer.get_weights()[0]    # (1, 9, 256, 36)# 从这个分类器权重中，将其预测标签的权重进一步提取出来predict_index = np.argmax(np.array(original_result), axis=2)[:, 0]    # (6,)class_weights_w = class_weights[:, :, :, predict_index[k]]    # (1, 9, 256)# 将分类器的权重与卷积模型的特征相乘后再求和cam = np.sum(add_result[0] * class_weights_w, axis=-1)    # (1, 9)cam /= np.max(cam)cam = cv2.resize(cam, (width, height))heatmap = cv2.applyColorMap(np.uint8(255 * cam), cv2.COLORMAP_JET)heatmap[np.where(cam < 0)] = 0cv2.imwrite('heatmap' + str(k) + '.jpg', heatmap)def plot_heatmap():model = get_model()final_layer = get_output_layer(model, 'batch_normalization_19')  # (None, 1, 9, 256)out = model.outputout.append(final_layer.output)new_model = Model(inputs=model.input, outputs=out, name='new_model')new_model.load_weights('best_val_loss1.982.h5')img = cv2.imread('039796.jpg')img1 = cv2.resize(img, (width, height), interpolation=cv2.INTER_AREA)img2 = img1 / 255# 对第1-6个分类器分支，分别进行可视化visualize_region(new_model, img2, 'conv2d_20', 0)visualize_region(new_model, img2, 'conv2d_21', 1)visualize_region(new_model, img2, 'conv2d_22', 2)visualize_region(new_model, img2, 'conv2d_23', 3)visualize_region(new_model, img2, 'conv2d_24', 4)visualize_region(new_model, img2, 'conv2d_25', 5)plot_heatmap()

六、相关链接

如果代码跑不通，或者想直接使用我自己制作的数据集，可以去下载项目链接：

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

OCR

OCR