Trained a simple tensorflow model containing some LSTM and Dense feed-forward layers. After training, I am quantising and converting the model to a tf.lite
format for edge deployment. Here is the relevant part of the code.
...
model_size:int = sum(weight.numpy().nbytes for weight in model.trainable_weights)
print(f'Model size: {model_size / (1024):.2f} KB')
tf_lite_converter = tf.lite.TFLiteConverter.from_keras_model(model=model)
tf_lite_converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
tf_lite_converter.target_spec.supported_types = [tf.float16]
tflite_model:bytes = tf_lite_converter.convert()
print(f'Size of the tf lite model is {len(tflite_model)/1024} kB')
As the tf_lite
is just a byte array, I am just dividing its length by 1024 to get the size in memory.
Original model size: 33.4 kB After compression and quantisation: 55 kB
So how is that possible or even beneficial if the tf_lite converter increases the size in memory? Or, am I measuring the sizes (in memory) wrongly? Any clue how to get a fair comparison then?
Note
I already compared the sizes in disk (as I persist the models) and yes, tflite shows a clear compression benefit. But is that where the benefit is to be found?