python - Why am I getting a ‘required broadcastable shapes’ error after removing max_length in my TensorFlow seq2seq with Attent

I’m working through an example of an encoder–decoder seq2seq model in TensorFlow with Bahdanau-style Attention. I followed a tutorial/book example that uses a TextVectorization layer with output_sequence_length=max_length. When I keep max_length, everything works fine. However, if I remove it because I want to handle variable-length input sequences, the model throws a INVALID_ARGUMENT: required broadcastable shapes error during training.

This makes me wonder why my char-level RNN model didn’t require a fixed max_length, but my attention-based seq2seq suddenly does. If I look at the error stack trace, it points to a shape mismatch in sparse_categorical_crossentropy/weighted_loss/Mul.

import tensorflow as tf

# Sample data
sentences_en = ["I love dogs", "You love cats", "We like soccer"]
sentences_es = ["me gustan los perros", "te gustan los gatos", "nos gusta el futbol"]

# TextVectorization without max_length
vocab_size = 1000
text_vec_layer_en = tf.keras.layers.TextVectorization(
    max_tokens=vocab_size
    # NOTE: Removed output_sequence_length to allow variable-length
)
text_vec_layer_es = tf.keras.layers.TextVectorization(
    max_tokens=vocab_size
    # NOTE: Also removed output_sequence_length here
)

text_vec_layer_en.adapt(sentences_en)
text_vec_layer_es.adapt(["startofseq " + s + " endofseq" for s in sentences_es])

# Model Inputs
encoder_inputs = tf.keras.layers.Input(shape=(), dtype=tf.string)
decoder_inputs = tf.keras.layers.Input(shape=(), dtype=tf.string)

encoder_input_ids = text_vec_layer_en(encoder_inputs)
decoder_input_ids = text_vec_layer_es(decoder_inputs)

embed_size = 128
embedding_en = tf.keras.layers.Embedding(vocab_size, embed_size, mask_zero=True)
embedding_es = tf.keras.layers.Embedding(vocab_size, embed_size, mask_zero=True)

encoder_embeds = embedding_en(encoder_input_ids)
decoder_embeds = embedding_es(decoder_input_ids)

# Bidirectional Encoder
encoder = tf.keras.layers.Bidirectional(
    tf.keras.layers.LSTM(256, return_sequences=True, return_state=True)
)
encoder_outputs, forward_h, forward_c, backward_h, backward_c = encoder(encoder_embeds)
encoder_state_h = tf.concat([forward_h, backward_h], axis=-1)
encoder_state_c = tf.concat([forward_c, backward_c], axis=-1)
encoder_state = [encoder_state_h, encoder_state_c]

# Decoder
decoder = tf.keras.layers.LSTM(512, return_sequences=True)
decoder_outputs = decoder(decoder_embeds, initial_state=encoder_state)

# Attention
attention_layer = tf.keras.layers.Attention()
attention_outputs = attention_layer([decoder_outputs, encoder_outputs])

# Final Dense Output
output_layer = tf.keras.layers.Dense(vocab_size, activation='softmax')
y_proba = output_layer(attention_outputs)

model = tf.keras.Model(inputs=[encoder_inputs, decoder_inputs], outputs=[y_proba])
modelpile(loss='sparse_categorical_crossentropy', optimizer='nadam', metrics=['accuracy'])

# Attempted training
# (Simplifying example: ignoring actual train split, just trying to run a dummy training step)
x_en = tf.constant(sentences_en)
x_es = tf.constant(["startofseq " + s + " endofseq" for s in sentences_es])
y_dummy = text_vec_layer_es(x_es)  # shape mismatch expected if variable

model.fit([x_en, x_es], y_dummy, epochs=1)

Error:

INVALID_ARGUMENT: required broadcastable shapes
  [[node gradient_tape/sparse_categorical_crossentropy/weighted_loss/Mul]]
...

Here's the seq2seq char RNN model which doesn't require a max_length in text vectorization layer.

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim=n_tokens, output_dim=16),
    tf.keras.layers.GRU(128, return_sequences=True),
    tf.keras.layers.Dense(n_tokens, activation="softmax")
])

Why do char-level RNNs or simple seq2seq sometimes work without specifying max_length, but this attention-based model does not?

Is this primarily because attention needs to be calculated over all input and output steps so max_length is important ahead of time?

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Why am I getting a ‘required broadcastable shapes’ error after removing max_length in my TensorFlow seq2seq with Attent

与本文相关的文章

评论列表(0)