I am using PaddleOCR with the Arabic language model (lang='ar') to perform OCR on Arabic images. While PaddleOCR correctly recognizes the Arabic characters, it processes the text in a Left-to-Right (LTR) order, which is incorrect for Arabic, a Right-to-Left (RTL) language. This results in the words and sentences being in reverse order.
I have reviewed the paddleocr --help output to see if there are any options to explicitly set the text direction or handle RTL languages like Arabic.
My question is:
Is there a specific option in PaddleOCR, possibly using ocr_order_method or another parameter, to correctly handle Right-to-Left languages like Arabic and ensure the output text is in the correct RTL order?
If there isn't a built-in option, what are the recommended workarounds to post-process the OCR output to reorder the text correctly for RTL languages in Python?
Any guidance or solutions on how to get PaddleOCR to output Arabic text in the correct Right-to-Left order would be greatly appreciated.
I tried to use the following code:
from paddleocr import PaddleOCR, draw_ocr
ocr = PaddleOCR(use_angle_cls=True, lang='Ar')
img_path = 'image5.jpg'
result = ocr.ocr(img_path, cls=True)
for idx in range(len(result)):
res = result[idx]
for line in res:
print(line)
# draw result
from PIL import Image
result = result[0]
image = Image.open(img_path).convert('RGB')
boxes = [line[0] for line in result]
txts = [line[1][0] for line in result]
scores = [line[1][1] for line in result]
im_show = draw_ocr(image, boxes, txts, scores, font_path='./doc/fonts/arabic.ttf')
im_show = Image.fromarray(im_show)
im_show.save('result.jpg')