I have a FastAPI deployment in GKE that has an end-point /execute
that reads and parses a file, something like below:
from fastapi import FastAPI
app = FastAPI()
def execute(
filepath: str
res = 0
with open(filepath, "r") as fo:
for line in fo.readlines():
if re.search("Hello", line):
res += 1
return {"message": f"Number of Hello lines = {res}."}
The GKE deployment has 10 pods with a load balancer and service exposing the deployment.
Now, I would like to send 100 different file paths to this deployment. In my mind, I have the following options, and related questions:
- Send all 100 requests at the same time and not wait for a response, either using threading,
, or something hacky like this:
for filepath in filepaths:
except requests.exceptions.ReadTimeout:
In this case, what does the GKE load balancer do when it receives a 100 requests - does it deliver 10 requests to each pod at the same time (in which case I would need to make sure a pod has enough resources to handle all the incoming requests at the same time), OR does it have a queuing system delivering a request to a pod only when it is available?
- Send 10 requests at a time, so that no pod is working on more than 1 request at any given time. That way, I can have predictable resource usage in a pod and not crash it. But how do I accomplish this in Python? And do I need to change anything in my FastAPI application or GKE deployment configuration?
Any help would be greatly appreciated!