kubernetes - OOM kills pod when setting the resource limits

Below is the stateful-set that I use. If I run it in minicube (with 2000M, 4Gi config) without resources.limits, then it runs fine. But if I specify resources.limits, which are equal to the same number of resources that minikube can provide, then the pod either does not work, or I get an error like: Unable to connect to the server: net/http: TLS handshake timeout. Why is this happening if, logically, this pod should have a similar resource limit without specifying resources.limits?

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
spec:
  serviceName: cassandra
  replicas: 1
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      containers:
        - name: cassandra
          image: sevabek/cassandra:latest
          ports:
            - containerPort: 9042
          volumeMounts:
            - mountPath: /var/lib/cassandra
              name: cassandra-storage

          livenessProbe:
            exec:
              command:
                - cqlsh
                - -e
                - "SELECT release_version FROM system.local;"
            initialDelaySeconds: 120
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 2

          resources:
            requests:
              memory: "3500Mi"
              cpu: "1700m"
            limits:
              memory: "4Gi"
              cpu: "2000m"

  volumeClaimTemplates:
    - metadata:
        name: cassandra-storage
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 3Gi

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
spec:
  serviceName: cassandra
  replicas: 1
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      containers:
        - name: cassandra
          image: sevabek/cassandra:latest
          ports:
            - containerPort: 9042
          volumeMounts:
            - mountPath: /var/lib/cassandra
              name: cassandra-storage

          livenessProbe:
            exec:
              command:
                - cqlsh
                - -e
                - "SELECT release_version FROM system.local;"
            initialDelaySeconds: 120
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 2

          resources:
            requests:
              memory: "3500Mi"
              cpu: "1700m"
            limits:
              memory: "4Gi"
              cpu: "2000m"

  volumeClaimTemplates:
    - metadata:
        name: cassandra-storage
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 3Gi

Share Improve this question asked Mar 3 at 14:04 Steffano Aravico 293 bronze badges

2 Run the pod without the limit, then run "kubectl top pod" to see what the resource consumption is at – Patrick W Commented Mar 3 at 17:37
@PatrickW I did as you said and got that this pod consumes only 3650Mi of memory and less than 500m of processor. Then why is there such a problem with the limit? – Steffano Aravico Commented Mar 4 at 17:40
The pod is running higher than the requested amount, so it makes it a top candidate for OOM kill if the node is running low on memory. Also, if your pod has spike memory usage during start up and hits the 4Gi, it will kill the pod – Patrick W Commented Mar 28 at 13:49

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

I suspect the container is using more memory than you anticipated because you've configured the liveness probe to run cqlsh:

          livenessProbe:
            exec:
              command:
                - cqlsh
                - -e
                - "SELECT release_version FROM system.local;"
            initialDelaySeconds: 120
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 2

cqlsh is a full-fledged Python application so it means that it consumes a significant amount of resources to run. It is a little excessive to use it just to check that Cassandra is "alive" every 30 seconds.

Cassandra is considered operational if it is listening for client connections on the CQL port (default is 9042). If something goes wrong for whatever reason (disk failure for example), Cassandra will automatically stop accepting connections and shutdown the CQL port.

Instead of running a CQL SELECT statement through cqlsh, I would suggest using a low-level TCP check using Linux utilities like netstat:

$ netstat -ltn | grep 9042

If you use a lightweight liveness probe, the Cassandra containers should use significantly less resources. Cheers!

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

kubernetes - OOM kills pod when setting the resource limits - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)