I am working in LangFlow and have this basic design:
- Chat Input connected to Agent (Input).
- Ollama (Llama3, Tool Model Enabled) connected to Agent (Language Model).
- Agent (Response) connected to Chat Output.
And when I test in Playground and ask a basic question, it took almost two minutes to respond.
I have gotten Ollama (model Llama3) work with my system's GPU (NVIDIA 4060) in VS Code but I haven't figured out how to apply the cuda settings in LangFlow.