websocket - python socket streaming (yield) llm response seems to be blocking everything

I am learning llm recently and trying to build a simple chatbot, where multiple clients can connect to this chatbot and chat with the model. I created a simple python code below, but I noticed when multiple clients are connected, and 1 client is receiving the stream, other clients are blocked.

I tried chatgpt/copilot to fix the code, tried asyncio.create_task, asyncio.to_thread, tried using FastAPI WebSockets, but none of them works. It seems when yield is streaming the response, everything is blocked.

Could someone help please? I must be missing something obvious. I am open to everything (e.g. use some python frameworks, use different language, completely rewrite code etc).

import asyncio
import websockets
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI()

async def get_data_from_model():
    stream = client.chatpletions.create(
        model="grok-3-beta",
        messages=[
            {
                "role": "user",
                "content": "Write a 5 sentences bedtime story about a unicorn."
            }
        ],
        stream=True,
    )
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            yield chunk.choices[0].delta.content

async def handle_client(websocket):
    try:
        while True:
            user_message = await websocket.recv()
            async def send_stream():
                async for chunk in get_data_from_model():
                    await websocket.send(chunk)
            
            # Run send_stream in the background for concurrent streaming
            asyncio.create_task(send_stream())

    except Exception as e:
        print(f"Connection error: {e}")

async def start(port=8765):
    async with websockets.serve(handle_client, "0.0.0.0", port):
        await asyncio.Future()

if __name__ == "__main__":
    asyncio.run(start(port=8000))

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

websocket - python socket streaming (yield) llm response seems to be blocking everything - Stack Overflow

与本文相关的文章

评论列表(0)