gemma - "PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'images'&q

I am trying to implement the mlx-collection/gemma-3-4b-it-4bit model with the mlx-vlm library to do multi-image inference, but I get this traceback error and I am not able to figure out how to solve it.

I tried to do both single and multi-image inference but the same error occurs.

Traceback (most recent call last):
  File "/Users/Administrator/Documents/create/controllers/VLM_on_Robotics/main.py", line 52, in <module>
    prediction, comp_time = phi3.generate(prompt, [images_PIL[0]])
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Administrator/Documents/create/controllers/VLM_on_Robotics/Llava_Phi3/phi3_mlx.py", line 60, in generate
    prediction = generate(self.model, self.processor, formatted_prompt, images, verbose=False)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/mlx_vlm/utils.py", line 1117, in generate
    for response in stream_generate(model, processor, prompt, image, **kwargs):
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/mlx_vlm/utils.py", line 1018, in stream_generate
    inputs = prepare_inputs(
             ^^^^^^^^^^^^^^^
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/mlx_vlm/utils.py", line 814, in prepare_inputs
    inputs = processor(
             ^^^^^^^^^^
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2877, in __call__
    encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2987, in _call_one
    return self.encode_plus(
           ^^^^^^^^^^^^^^^^^
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3063, in encode_plus
    return self._encode_plus(
           ^^^^^^^^^^^^^^^^^^
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 613, in _encode_plus
    batched_output = self._batch_encode_plus(
                     ^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'images'

Since the traceback refers to an unexpected argument, I tried to take out the images argument and do only-test inference and the script works.

Does this mean that there is a bug on how gemma is implemented for which vision tasks are not supported?

I tried to do both single and multi-image inference but the same error occurs.

Traceback (most recent call last):
  File "/Users/Administrator/Documents/create/controllers/VLM_on_Robotics/main.py", line 52, in <module>
    prediction, comp_time = phi3.generate(prompt, [images_PIL[0]])
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Administrator/Documents/create/controllers/VLM_on_Robotics/Llava_Phi3/phi3_mlx.py", line 60, in generate
    prediction = generate(self.model, self.processor, formatted_prompt, images, verbose=False)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/mlx_vlm/utils.py", line 1117, in generate
    for response in stream_generate(model, processor, prompt, image, **kwargs):
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/mlx_vlm/utils.py", line 1018, in stream_generate
    inputs = prepare_inputs(
             ^^^^^^^^^^^^^^^
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/mlx_vlm/utils.py", line 814, in prepare_inputs
    inputs = processor(
             ^^^^^^^^^^
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2877, in __call__
    encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2987, in _call_one
    return self.encode_plus(
           ^^^^^^^^^^^^^^^^^
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3063, in encode_plus
    return self._encode_plus(
           ^^^^^^^^^^^^^^^^^^
  File "/Users/Administrator/Documents/create/venv/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 613, in _encode_plus
    batched_output = self._batch_encode_plus(
                     ^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'images'

Since the traceback refers to an unexpected argument, I tried to take out the images argument and do only-test inference and the script works.

Does this mean that there is a bug on how gemma is implemented for which vision tasks are not supported?

Share Improve this question edited Mar 25 at 1:36 President James K. Polk 42.1k29 gold badges109 silver badges145 bronze badges asked Mar 24 at 16:33 Tommaso Tubaldo 11 bronze badge

Please provide enough code so others can better understand or reproduce the problem. – Community Bot Commented Mar 26 at 3:18

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

Problem solved, the issue is related to the transformers dependency.

See: https://github/Blaizzy/mlx-vlm/issues/274

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

gemma - "PreTrainedTokenizerFast._batch_encode_plus() got an unexpected keyword argument 'images'&q

1 Answer 1

与本文相关的文章

评论列表(0)