python - How to have a (FastAPI) GKE deployment handle multiple requests?

I have a FastAPI deployment in GKE that has an end-point /execute that reads and parses a file, something like below:

from fastapi import FastAPI

app = FastAPI()

@app.post("/execute")
def execute(
    filepath: str
):
    res = 0
    with open(filepath, "r") as fo:
         for line in fo.readlines():
              if re.search("Hello", line):
                   res += 1
         return {"message": f"Number of Hello lines = {res}."}

The GKE deployment has 10 pods with a load balancer and service exposing the deployment.

Now, I would like to send 100 different file paths to this deployment. In my mind, I have the following options, and related questions:

Send all 100 requests at the same time and not wait for a response, either using threading, asyncio and aiohttp, or something hacky like this:

for filepath in filepaths:
    try:
        requests.post("http://127.0.0.1:8000/execute?filepath=filepath",timeout=0.0000000001)
    except requests.exceptions.ReadTimeout: 
        pass

Ref:

In this case, what does the GKE load balancer do when it receives a 100 requests - does it deliver 10 requests to each pod at the same time (in which case I would need to make sure a pod has enough resources to handle all the incoming requests at the same time), OR does it have a queuing system delivering a request to a pod only when it is available?

Send 10 requests at a time, so that no pod is working on more than 1 request at any given time. That way, I can have predictable resource usage in a pod and not crash it. But how do I accomplish this in Python? And do I need to change anything in my FastAPI application or GKE deployment configuration?

Any help would be greatly appreciated!

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - How to have a (FastAPI) GKE deployment handle multiple requests? - Stack Overflow

与本文相关的文章

评论列表(0)