I'm trying to follow the solution detailed at this question to prepare a dataset to train a CRNN for HTR (Handwritten Text Recognition). I'm using eScriptorium to adjust text segmentation and transcription, exporting in ALTO format (one XML with text region coordinates for each image) and parsing the ALTO XML to grab the text image regions and export them individually to create a dataset.
The problem I'm finding is that I have the region defined at eScriptorium, like this:
But when I apply this code from the selected solution for the above linked question:
# Initialize mask
mask = np.zeros((img.shape[0], img.shape[1]))
# Create mask that defines the polygon of points
cv2.fillConvexPoly(mask, pts, 1)
mask = mask > 0 # To convert to Boolean
# Create output image (untranslated)
out = np.zeros_like(img)
out[mask] = img[mask]
and display the image I get some parts of the text region filled:
As you can see, some areas that should be inside the mask are filled and, therefore, the image pixels in them are not copied. I've made sure the pixels that make the polygon are correctly parsed and handed to OpenCV to build the mask. I can't find the reason why those areas are filled and I wonder if anyone got into a similar problem and managed to find out the reason or how to avoid it.
TIA
I'm trying to follow the solution detailed at this question to prepare a dataset to train a CRNN for HTR (Handwritten Text Recognition). I'm using eScriptorium to adjust text segmentation and transcription, exporting in ALTO format (one XML with text region coordinates for each image) and parsing the ALTO XML to grab the text image regions and export them individually to create a dataset.
The problem I'm finding is that I have the region defined at eScriptorium, like this:
But when I apply this code from the selected solution for the above linked question:
# Initialize mask
mask = np.zeros((img.shape[0], img.shape[1]))
# Create mask that defines the polygon of points
cv2.fillConvexPoly(mask, pts, 1)
mask = mask > 0 # To convert to Boolean
# Create output image (untranslated)
out = np.zeros_like(img)
out[mask] = img[mask]
and display the image I get some parts of the text region filled:
As you can see, some areas that should be inside the mask are filled and, therefore, the image pixels in them are not copied. I've made sure the pixels that make the polygon are correctly parsed and handed to OpenCV to build the mask. I can't find the reason why those areas are filled and I wonder if anyone got into a similar problem and managed to find out the reason or how to avoid it.
TIA
Share Improve this question edited yesterday Christoph Rackwitz 15.7k5 gold badges39 silver badges51 bronze badges asked 2 days ago Ricardo Palomares MartínezRicardo Palomares Martínez 1,1779 silver badges13 bronze badges 5 |1 Answer
Reset to default 1You called cv.fillConvexPoly()
. Your polygon is not convex. The algorithm assumed it to be convex and took some shortcuts to simplify the drawing code, so it came out wrong.
Use cv.fillPoly()
instead. That will draw non-convex polygons correctly.
As you point out, the function signatures are not drop-in compatible. fillPoly()
works on a list of polygons, while fillComplexPoly()
just takes a single polygon.
cv.fillConvexPoly(img, points, color)
# would be replaced with
cv.fillPoly(img, [points], color) # list of one polygon
Each polygon should be a numpy array of shape (N, 1, 2)
and it probably needs to be of an integer dtype
too, although I'm not sure about that now and it might support floating point dtype in the future.
dtype
ofnp.uint8
when you create the mask, e.g.mask = np.zeros((h,w), dtype=np.uint8)
– Mark Setchell Commented 2 days ago