最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

docker - Custom Container on Vertex AI Returns "405 Method Not Allowed" for predict Endpoint - Stack Overflow

programmeradmin0浏览0评论

I'm encountering an issue when deploying my custom container on Vertex AI. Locally my Flask server (running via Gunicorn) works perfectly—both the /predict and /health endpoints respond as expected. However, when Vertex AI calls the prediction API, I always receive a 405 Method Not Allowed error.

My Setup

  • Container: I use a custom Docker container that exposes port 8080.
  • Model Upload: I upload my model to Vertex AI with the following flags:
    • --container-predict-route=/predict
    • --container-health-route=/health
  • Prediction Call: I call the prediction API using the Google Cloud AI Platform client library.

Observations

  • Vertex AI PredictionService sends requests to a URL like:
    /v1/endpoints/<ENDPOINT_ID>/deployedModels/<DEPLOYED_MODEL_ID>:predict
    but my server returns 405.
  • If I perform a GET request to the endpoint (for example, via terminal), I receive a valid response
    However, when calling /predict (or even /rawPredict as described in Vertex AI rawPredict docs), I still get a 405.
  • The server is running (i think) since i receive a log every 10 seconds like:
    <GET /v1/endpoints/<ENDPOINT>/deployedModels/<MODEL> HTTP/1.1" 200 OK>.
  • I've added multiple route definitions (including catch-all routes) to handle URLs such as /v1/endpoints/<endpoint_id>/deployedModels/<model_id>:predict, but the error persists.

Below is my code:


Dockerfile:

FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04
RUN apt-get update && apt-get install -y --no-install-recommends \
    wget \
    curl \
    python3-dev \
    python3-pip \
    python3-setuptools && \
    rm -rf /var/lib/apt/lists/*
RUN ln -sf /usr/bin/python3 /usr/bin/python
WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir torch>=1.12.0 torchvision>=0.13.0 && \
    if [ -f requirements.txt ]; then pip install --no-cache-dir -r requirements.txt; fi
COPY . .
EXPOSE 8080
CMD ["gunicorn", "-w", "1", "-b", "0.0.0.0:8080", "main:app"]

Function to call Vertex AI API (call_vertex_ai):

def call_vertex_ai(gcs_uri: str, additional_args: dict):
    client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis"}
    client = aiplatform.gapic.PredictionServiceClient(
        client_options=client_options)

    instance = predict.instance.ImageClassificationPredictionInstance(
        content=gcs_uri  # GCS path for image file
    ).to_value()
    instances = [instance]

    parameters = predict.params.ImageClassificationPredictionParams(
        confidence_threshold=additional_args.get("threshold", 0.5),
    ).to_value()

    endpoint = client.endpoint_path(
        project=PROJECT_ID, location=REGION, endpoint=ENDPOINT_ID
    )

    response = client.predict(
        endpoint=endpoint, instances=instances, parameters=parameters)

    return response.predictions

main.py (Vertex AI prediction server):

... some imports ... 
app = Flask(__name__)

def load_model():
    ...

load_model()


def handle_predict():    ... code ...
    detections = [{
        "bbox": bbox.tolist() if isinstance(bbox, np.ndarray) else bbox,
        "class": class_name,
        "score": float(score),
    } for bbox, class_name, score in zip(draw_boxes, pred_classes, scores)]

    return jsonify({"predictions": detections})

@app.post("/predict")
def predict():
    return handle_predict()

@app.route("/health", methods=["GET"])
def health():
    return jsonify({"status": "healthy"})


@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<path:deployed_model_path>", methods=["POST"])
def predict_deployed_model(endpoint_id, deployed_model_path):
    if not deployed_model_path.endswith(":predict"):
        return "Not Found", 404
    return handle_predict()

@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<deployed_model_id>:predict", methods=["POST"])
def predict_deployed_model_direct(endpoint_id, deployed_model_id):
    return handle_predict()

@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<deployed_model_id>:rawPredict", methods=["POST"])
def raw_predict_deployed_model(endpoint_id, deployed_model_id):
    return handle_predict()

@app.before_request
def log_request_info():
    logger.info(f"Received request: {request.method} {request.url}")
    logger.info(f"Headers: {dict(request.headers)}")
    logger.info(f"Body: {request.get_data().decode('utf-8')}")

deploy.sh (the code is semplified):

gcloud builds submit \
  --tag "${IMAGE_NAME}:latest" \
  --gcs-source-staging-dir="gs://$BUCKET_NAME/source" \
  --gcs-log-dir="gs://$BUCKET_NAME/logs"

LATEST_IMAGE="${IMAGE_NAME}:latest"

gcloud ai models upload \
  --region="${REGION}" \
  --display-name="weldpredict-model" \
  --container-image-uri="${LATEST_IMAGE}" \
  --container-ports=8080 \
  --container-predict-route=/predict \
  --container-health-route=/health

ENDPOINT_ID=$(gcloud ai endpoints list --region="${REGION}" --format="value(ENDPOINT_ID)")

DEPLOYED_MODEL_ID=$(gcloud ai endpoints describe "${ENDPOINT_ID}" --region="${REGION}" --format="value(deployedModels.id)")
gcloud ai endpoints undeploy-model "${ENDPOINT_ID}" --deployed-model-id="${DEPLOYED_MODEL_ID}" --region="${REGION}" --quiet

gcloud ai endpoints deploy-model "${ENDPOINT_ID}" \
    --model="${MODEL_ID}" \
    --region="${REGION}" \
    --display-name="weldpredict-deployment" \
    --machine-type=n1-standard-4 \
    --accelerator=type=nvidia-tesla-t4,count=1 \
    --min-replica-count=1 \
    --max-replica-count=1 \
    --traffic-split=0=100

Issue Summary:

  • Problem: When calling Vertex AI predictions, I receive a 405 "Method Not Allowed" error.

  • Observation: Locally, my Flask server correctly handles /predict and /health, and GET requests to the endpoint return a 200 OK. However, when I call /predict (or /rawPredict) on Vertex AI, I get a 405.

  • Setup: I deploy my custom container on Vertex AI with the --container-predict-route=/predict flag, yet Vertex AI sends requests (e.g., /v1/endpoints/<ENDPOINT_ID>/deployedModels/<DEPLOYED_MODEL_ID>:predict) that are not matched by my routes.

  • Attempts: I have added multiple route definitions—including catch-all routes—to handle URLs like /v1/endpoints/<endpoint_id>/deployedModels/<model_id>:predict but still encounter the 405 error.

  • Additional Info:

    • A log every 10 seconds catches:

      GET /v1/endpoints/<ENDPOINT>/deployedModels/<MODEL> HTTP/1.1" 200 OK
      
    • However, when calling /predict or /rawPredict the server returns 405.

Request:
I need help understanding why Vertex AI's prediction requests are not being handled as expected by my Flask server and how to properly configure my container or routes to resolve the 405 error.

Any guidance or suggestions would be greatly appreciated.

Thanks in advance!

I'm encountering an issue when deploying my custom container on Vertex AI. Locally my Flask server (running via Gunicorn) works perfectly—both the /predict and /health endpoints respond as expected. However, when Vertex AI calls the prediction API, I always receive a 405 Method Not Allowed error.

My Setup

  • Container: I use a custom Docker container that exposes port 8080.
  • Model Upload: I upload my model to Vertex AI with the following flags:
    • --container-predict-route=/predict
    • --container-health-route=/health
  • Prediction Call: I call the prediction API using the Google Cloud AI Platform client library.

Observations

  • Vertex AI PredictionService sends requests to a URL like:
    /v1/endpoints/<ENDPOINT_ID>/deployedModels/<DEPLOYED_MODEL_ID>:predict
    but my server returns 405.
  • If I perform a GET request to the endpoint (for example, via terminal), I receive a valid response
    However, when calling /predict (or even /rawPredict as described in Vertex AI rawPredict docs), I still get a 405.
  • The server is running (i think) since i receive a log every 10 seconds like:
    <GET /v1/endpoints/<ENDPOINT>/deployedModels/<MODEL> HTTP/1.1" 200 OK>.
  • I've added multiple route definitions (including catch-all routes) to handle URLs such as /v1/endpoints/<endpoint_id>/deployedModels/<model_id>:predict, but the error persists.

Below is my code:


Dockerfile:

FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04
RUN apt-get update && apt-get install -y --no-install-recommends \
    wget \
    curl \
    python3-dev \
    python3-pip \
    python3-setuptools && \
    rm -rf /var/lib/apt/lists/*
RUN ln -sf /usr/bin/python3 /usr/bin/python
WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir torch>=1.12.0 torchvision>=0.13.0 && \
    if [ -f requirements.txt ]; then pip install --no-cache-dir -r requirements.txt; fi
COPY . .
EXPOSE 8080
CMD ["gunicorn", "-w", "1", "-b", "0.0.0.0:8080", "main:app"]

Function to call Vertex AI API (call_vertex_ai):

def call_vertex_ai(gcs_uri: str, additional_args: dict):
    client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis"}
    client = aiplatform.gapic.PredictionServiceClient(
        client_options=client_options)

    instance = predict.instance.ImageClassificationPredictionInstance(
        content=gcs_uri  # GCS path for image file
    ).to_value()
    instances = [instance]

    parameters = predict.params.ImageClassificationPredictionParams(
        confidence_threshold=additional_args.get("threshold", 0.5),
    ).to_value()

    endpoint = client.endpoint_path(
        project=PROJECT_ID, location=REGION, endpoint=ENDPOINT_ID
    )

    response = client.predict(
        endpoint=endpoint, instances=instances, parameters=parameters)

    return response.predictions

main.py (Vertex AI prediction server):

... some imports ... 
app = Flask(__name__)

def load_model():
    ...

load_model()


def handle_predict():    ... code ...
    detections = [{
        "bbox": bbox.tolist() if isinstance(bbox, np.ndarray) else bbox,
        "class": class_name,
        "score": float(score),
    } for bbox, class_name, score in zip(draw_boxes, pred_classes, scores)]

    return jsonify({"predictions": detections})

@app.post("/predict")
def predict():
    return handle_predict()

@app.route("/health", methods=["GET"])
def health():
    return jsonify({"status": "healthy"})


@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<path:deployed_model_path>", methods=["POST"])
def predict_deployed_model(endpoint_id, deployed_model_path):
    if not deployed_model_path.endswith(":predict"):
        return "Not Found", 404
    return handle_predict()

@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<deployed_model_id>:predict", methods=["POST"])
def predict_deployed_model_direct(endpoint_id, deployed_model_id):
    return handle_predict()

@app.route("/v1/endpoints/<endpoint_id>/deployedModels/<deployed_model_id>:rawPredict", methods=["POST"])
def raw_predict_deployed_model(endpoint_id, deployed_model_id):
    return handle_predict()

@app.before_request
def log_request_info():
    logger.info(f"Received request: {request.method} {request.url}")
    logger.info(f"Headers: {dict(request.headers)}")
    logger.info(f"Body: {request.get_data().decode('utf-8')}")

deploy.sh (the code is semplified):

gcloud builds submit \
  --tag "${IMAGE_NAME}:latest" \
  --gcs-source-staging-dir="gs://$BUCKET_NAME/source" \
  --gcs-log-dir="gs://$BUCKET_NAME/logs"

LATEST_IMAGE="${IMAGE_NAME}:latest"

gcloud ai models upload \
  --region="${REGION}" \
  --display-name="weldpredict-model" \
  --container-image-uri="${LATEST_IMAGE}" \
  --container-ports=8080 \
  --container-predict-route=/predict \
  --container-health-route=/health

ENDPOINT_ID=$(gcloud ai endpoints list --region="${REGION}" --format="value(ENDPOINT_ID)")

DEPLOYED_MODEL_ID=$(gcloud ai endpoints describe "${ENDPOINT_ID}" --region="${REGION}" --format="value(deployedModels.id)")
gcloud ai endpoints undeploy-model "${ENDPOINT_ID}" --deployed-model-id="${DEPLOYED_MODEL_ID}" --region="${REGION}" --quiet

gcloud ai endpoints deploy-model "${ENDPOINT_ID}" \
    --model="${MODEL_ID}" \
    --region="${REGION}" \
    --display-name="weldpredict-deployment" \
    --machine-type=n1-standard-4 \
    --accelerator=type=nvidia-tesla-t4,count=1 \
    --min-replica-count=1 \
    --max-replica-count=1 \
    --traffic-split=0=100

Issue Summary:

  • Problem: When calling Vertex AI predictions, I receive a 405 "Method Not Allowed" error.

  • Observation: Locally, my Flask server correctly handles /predict and /health, and GET requests to the endpoint return a 200 OK. However, when I call /predict (or /rawPredict) on Vertex AI, I get a 405.

  • Setup: I deploy my custom container on Vertex AI with the --container-predict-route=/predict flag, yet Vertex AI sends requests (e.g., /v1/endpoints/<ENDPOINT_ID>/deployedModels/<DEPLOYED_MODEL_ID>:predict) that are not matched by my routes.

  • Attempts: I have added multiple route definitions—including catch-all routes—to handle URLs like /v1/endpoints/<endpoint_id>/deployedModels/<model_id>:predict but still encounter the 405 error.

  • Additional Info:

    • A log every 10 seconds catches:

      GET /v1/endpoints/<ENDPOINT>/deployedModels/<MODEL> HTTP/1.1" 200 OK
      
    • However, when calling /predict or /rawPredict the server returns 405.

Request:
I need help understanding why Vertex AI's prediction requests are not being handled as expected by my Flask server and how to properly configure my container or routes to resolve the 405 error.

Any guidance or suggestions would be greatly appreciated.

Thanks in advance!

Share Improve this question asked Mar 4 at 1:50 Giulio ManuzziGiulio Manuzzi 111 bronze badge
Add a comment  | 

1 Answer 1

Reset to default 0

Based on the public documentation for using a custom containers, If your use case requires libraries that aren't included in the prebuilt containers, or maybe you have custom data transformations you want to perform as part of the prediction request, you can use a custom container that you build and push to the Artifact Registry. While custom containers allow for greater customization, the container must run an HTTP server. Specifically, the container must listen and respond to liveness checks, health checks, and prediction requests. In most cases, using a prebuilt container if possible is the recommended and simpler option. For an example of using a custom container, see the notebook PyTorch Image Classification Single GPU using Vertex Training with Custom Container.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论