we have 300 stores in a retail demand forecasting problem and we have 1 years of daily demand data, for some stores shorter. Using GRU to also model the extreme cases but the problem is how to feed data to the model. Lets say batch_size is 32, sequence_length is 7, num_features is 40. So I have 4 dimensional data (store_num, batch_size, seq_length, num_features) and when i feed this data, the model could not learn the relations. While getting forecast, it gives the same number for most cases and within a very narrow range.
Instead of 4d data, if i give 3d data (store_num, batch_size, seq_length, num_features), I feel like it is wrong because the time series sequence will change.
Can i train it with 4d data and make some modifications? If should continue with 3d data how to sort (preprocess) the train data fed into the model?