I am learning llm recently and trying to build a simple chatbot, where multiple clients can connect to this chatbot and chat with the model. I created a simple python code below, but I noticed when multiple clients are connected, and 1 client is receiving the stream, other clients are blocked.
I tried chatgpt/copilot to fix the code, tried asyncio.create_task
, asyncio.to_thread
, tried using FastAPI WebSockets, but none of them works. It seems when yield
is streaming the response, everything is blocked.
Could someone help please? I must be missing something obvious. I am open to everything (e.g. use some python frameworks, use different language, completely rewrite code etc).
import asyncio
import websockets
from openai import OpenAI
from dotenv import load_dotenv
import os
load_dotenv()
client = OpenAI()
async def get_data_from_model():
stream = client.chatpletions.create(
model="grok-3-beta",
messages=[
{
"role": "user",
"content": "Write a 5 sentences bedtime story about a unicorn."
}
],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content is not None:
yield chunk.choices[0].delta.content
async def handle_client(websocket):
try:
while True:
user_message = await websocket.recv()
async def send_stream():
async for chunk in get_data_from_model():
await websocket.send(chunk)
# Run send_stream in the background for concurrent streaming
asyncio.create_task(send_stream())
except Exception as e:
print(f"Connection error: {e}")
async def start(port=8765):
async with websockets.serve(handle_client, "0.0.0.0", port):
await asyncio.Future()
if __name__ == "__main__":
asyncio.run(start(port=8000))