I have a Qwen2.5-0.5 tflite model and I would like to test it in Python (not just the encode/decode aspect but the model generation abilities) and C or C++ before deploying on edge and then deploy it in C or C++
I cannot seem to find any documentation on the matter.
The goal is to associate the tflite model with the tokenizer.json accessible on the hugging face page of the model. I think I have to map text input into vectors to encode and give my input to the model and then vectors to text to decode.
My issue is that I don’t know how or if it is even how I’m supposed to proceed. I’ve read what seems like the entire tensorflow/tflite docs and can’t find any answer.
If someone could give me documentation and example code that would be amazing.
Also my second issue is that I am creating a helpful assistant on edge, so I will have to use RAG with my tflite (a context file), where the model will retrieve informations based on the user’s query and generate an answer with it. For that I think I will have to use embeddings ? And I am honestly completely lost.
Thank you for your help