I'm working on implementing image segmentation using my own custom TFLite model, following the code example from MediaPipe. Here's my code:
options = vision.ImageSegmenterOptions(
base_options=base_options,
running_mode=mp.tasks.vision.RunningMode.IMAGE,
output_confidence_masks=True,
output_category_mask=False
)
mp_image = mp.Image.create_from_file(image_path)
with vision.ImageSegmenter.create_from_options(options) as segmenter:
segmentation_result = segmenter.segment(mp_image)
output_mask = segmentation_result.confidence_masks[0]
I've encountered two issues with the above code:
The model has two outputs:
Output 0: Name = Identity0, Shape = [1, 1], Type = numpy.float32
Output 1: Name = Identity1, Shape = [1, x, y, z], Type = numpy.float32 (where x * y * z == image_width * image_height * image_channel=1)
How can I retrieve both outputs instead of just one?
The confidence_masks values are almost identical (min/max = 0.0701157/0.070115715), which seems unusual. The original image contains a person, and the output is correct when using my custom TFLite model with tf.lite.Interpreter.get_tensor().
I know that many frameworks support models with multiple inputs and outputs, so I'm confused about what I might be missing. Here are my specific questions:
- Do I need to add special metadata to the TFLite model file?
- How should I modify the original MediaPipe code to handle multiple outputs?
I'm working on implementing image segmentation using my own custom TFLite model, following the code example from MediaPipe. Here's my code:
options = vision.ImageSegmenterOptions(
base_options=base_options,
running_mode=mp.tasks.vision.RunningMode.IMAGE,
output_confidence_masks=True,
output_category_mask=False
)
mp_image = mp.Image.create_from_file(image_path)
with vision.ImageSegmenter.create_from_options(options) as segmenter:
segmentation_result = segmenter.segment(mp_image)
output_mask = segmentation_result.confidence_masks[0]
I've encountered two issues with the above code:
The model has two outputs:
Output 0: Name = Identity0, Shape = [1, 1], Type = numpy.float32
Output 1: Name = Identity1, Shape = [1, x, y, z], Type = numpy.float32 (where x * y * z == image_width * image_height * image_channel=1)
How can I retrieve both outputs instead of just one?
The confidence_masks values are almost identical (min/max = 0.0701157/0.070115715), which seems unusual. The original image contains a person, and the output is correct when using my custom TFLite model with tf.lite.Interpreter.get_tensor().
I know that many frameworks support models with multiple inputs and outputs, so I'm confused about what I might be missing. Here are my specific questions:
- Do I need to add special metadata to the TFLite model file?
- How should I modify the original MediaPipe code to handle multiple outputs?
1 Answer
Reset to default 0Why do you have output_category_mask=False
and are expecting 2 outputs ? You are specifically asking the model to only return 1 output.
Please check the documentation and source code.
output_confidence_masks:
Whether to output confidence.
output_category_mask:
Whether to output category mask.