I am using prometheus alerting for rabbitmq. Below is the configuration I am using.
prometheus.yml
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 5m # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
alerting:
alertmanagers:
- static_configs:
- targets:
- ip:port
rule_files:
- "alerts_rules.yml"
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["ip:port"]
alerts_rules.yml
groups:
- name: instance_alerts
rules:
- alert: "Instance Down"
expr: up == 0
for: 30s
# keep_firing_for: 30s
labels:
severity: "Critical"
annotations:
summary: "Endpoint {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 30 sec."
- name: rabbitmq_alerts
rules:
- alert: "Consumer down for last 1 min"
expr: rabbitmq_queue_consumers == 0
for: 30s
# keep_firing_for: 30s
labels:
severity: Critical
annotations:
summary: "shortify | '{{ $labels.queue }}' has no consumers"
description: "The queue '{{ $labels.queue }}' in vhost '{{ $labels.vhost }}' has zero consumers for more than 30 sec. Immediate attention is required."
- alert: "Total Messages > 10k in last 1 min"
expr: rabbitmq_queue_messages > 10000
for: 30s
# keep_firing_for: 30s
labels:
severity: Critical
annotations:
summary: "'{{ $labels.queue }}' has total '{{ $value }}' messages for more than 1 min."
description: |
Queue {{ $labels.queue }} in RabbitMQ has total {{ $value }} messages for more than 1 min.
Even if there is no data in queue, it sends me alerts as I have kept evaluation_interval: 5m
(Prometheus evaluates alert rules every 5 minutes) and for: 30s
(Ensures the alert fires only if the condition persists for 30s).
I guess for: 30s
is not working for me.
By the way i am not using alertmanager, i am just using prometheus
How can i solve this. Thank you in advance.