python - PaddleOCR OCR analyzes Left-to-Right instead of Right-to-Left for Arabic- How to process RTL languages correctly?

I am using PaddleOCR with the Arabic language model (lang='ar') to perform OCR on Arabic images. While PaddleOCR correctly recognizes the Arabic characters, it processes the text in a Left-to-Right (LTR) order, which is incorrect for Arabic, a Right-to-Left (RTL) language. This results in the words and sentences being in reverse order.

I have reviewed the paddleocr --help output to see if there are any options to explicitly set the text direction or handle RTL languages like Arabic.

My question is:

Is there a specific option in PaddleOCR, possibly using ocr_order_method or another parameter, to correctly handle Right-to-Left languages like Arabic and ensure the output text is in the correct RTL order?

If there isn't a built-in option, what are the recommended workarounds to post-process the OCR output to reorder the text correctly for RTL languages in Python?

Any guidance or solutions on how to get PaddleOCR to output Arabic text in the correct Right-to-Left order would be greatly appreciated.

I tried to use the following code:

from paddleocr import PaddleOCR, draw_ocr

ocr = PaddleOCR(use_angle_cls=True, lang='Ar') 
img_path = 'image5.jpg'
result = ocr.ocr(img_path, cls=True)
for idx in range(len(result)):
    res = result[idx]
    for line in res:
        print(line)

# draw result
from PIL import Image
result = result[0]
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='./doc/fonts/arabic.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - PaddleOCR OCR analyzes Left-to-Right instead of Right-to-Left for Arabic- How to process RTL languages correctly? - Sta

与本文相关的文章

评论列表(0)