最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

kubernetes - OOM kills pod when setting the resource limits - Stack Overflow

programmeradmin1浏览0评论

Below is the stateful-set that I use. If I run it in minicube (with 2000M, 4Gi config) without resources.limits, then it runs fine. But if I specify resources.limits, which are equal to the same number of resources that minikube can provide, then the pod either does not work, or I get an error like: Unable to connect to the server: net/http: TLS handshake timeout. Why is this happening if, logically, this pod should have a similar resource limit without specifying resources.limits?

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
spec:
  serviceName: cassandra
  replicas: 1
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      containers:
        - name: cassandra
          image: sevabek/cassandra:latest
          ports:
            - containerPort: 9042
          volumeMounts:
            - mountPath: /var/lib/cassandra
              name: cassandra-storage

          livenessProbe:
            exec:
              command:
                - cqlsh
                - -e
                - "SELECT release_version FROM system.local;"
            initialDelaySeconds: 120
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 2

          resources:
            requests:
              memory: "3500Mi"
              cpu: "1700m"
            limits:
              memory: "4Gi"
              cpu: "2000m"

  volumeClaimTemplates:
    - metadata:
        name: cassandra-storage
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 3Gi

Below is the stateful-set that I use. If I run it in minicube (with 2000M, 4Gi config) without resources.limits, then it runs fine. But if I specify resources.limits, which are equal to the same number of resources that minikube can provide, then the pod either does not work, or I get an error like: Unable to connect to the server: net/http: TLS handshake timeout. Why is this happening if, logically, this pod should have a similar resource limit without specifying resources.limits?

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
spec:
  serviceName: cassandra
  replicas: 1
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      containers:
        - name: cassandra
          image: sevabek/cassandra:latest
          ports:
            - containerPort: 9042
          volumeMounts:
            - mountPath: /var/lib/cassandra
              name: cassandra-storage

          livenessProbe:
            exec:
              command:
                - cqlsh
                - -e
                - "SELECT release_version FROM system.local;"
            initialDelaySeconds: 120
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 2

          resources:
            requests:
              memory: "3500Mi"
              cpu: "1700m"
            limits:
              memory: "4Gi"
              cpu: "2000m"

  volumeClaimTemplates:
    - metadata:
        name: cassandra-storage
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 3Gi

Share Improve this question asked Mar 3 at 14:04 Steffano AravicoSteffano Aravico 293 bronze badges 3
  • 2 Run the pod without the limit, then run "kubectl top pod" to see what the resource consumption is at – Patrick W Commented Mar 3 at 17:37
  • @PatrickW I did as you said and got that this pod consumes only 3650Mi of memory and less than 500m of processor. Then why is there such a problem with the limit? – Steffano Aravico Commented Mar 4 at 17:40
  • The pod is running higher than the requested amount, so it makes it a top candidate for OOM kill if the node is running low on memory. Also, if your pod has spike memory usage during start up and hits the 4Gi, it will kill the pod – Patrick W Commented Mar 28 at 13:49
Add a comment  | 

1 Answer 1

Reset to default 0

I suspect the container is using more memory than you anticipated because you've configured the liveness probe to run cqlsh:

          livenessProbe:
            exec:
              command:
                - cqlsh
                - -e
                - "SELECT release_version FROM system.local;"
            initialDelaySeconds: 120
            periodSeconds: 30
            timeoutSeconds: 10
            failureThreshold: 2

cqlsh is a full-fledged Python application so it means that it consumes a significant amount of resources to run. It is a little excessive to use it just to check that Cassandra is "alive" every 30 seconds.

Cassandra is considered operational if it is listening for client connections on the CQL port (default is 9042). If something goes wrong for whatever reason (disk failure for example), Cassandra will automatically stop accepting connections and shutdown the CQL port.

Instead of running a CQL SELECT statement through cqlsh, I would suggest using a low-level TCP check using Linux utilities like netstat:

$ netstat -ltn | grep 9042

If you use a lightweight liveness probe, the Cassandra containers should use significantly less resources. Cheers!

发布评论

评论列表(0)

  1. 暂无评论