I have the following configuration (i.e. group_by: ['...']
) to ungroup alerts but it is not working:
global:
pagerduty_url:
resolve_timeout: 5m
smtp_from: [email protected]
smtp_require_tls: false
smtp_smarthost: xxx.zw.corp:25
inhibit_rules:
- equal: ['alertname']
source_matchers:
- severity = critical
target_matchers:
- severity = warning
- equal: ['namespace']
source_matchers:
- severity = warning
target_matchers:
- severity = info
- source_matchers:
- alertname = InfoInhibitor
target_matchers:
- severity = info
- equal: ["host"]
source_matchers:
- alertname = HostDown
target_matchers:
receivers:
- name: 'null'
- name: pd_secops #
pagerduty_configs:
- details:
runbook_url: "{{ .CommonAnnotations.runbook_url }}"
routing_key: "cd1xxxxx"
severity: "{{ .CommonLabels.severity }}"
send_resolved: false
- name: 'email'
email_configs:
- to: '{{ .CommonLabels.email }}'
from: '[email protected]'
smarthost: 'xxx.zw.corp:25'
require_tls: false
route:
group_by: ['alertname']
group_interval: 2m
group_wait: 1m
receiver: 'null'
repeat_interval: 8736h
routes:
- matchers:
- alertname = Watchdog
receiver: 'null'
- matchers:
- alertname = InfoInhibitor
receiver: 'null'
- matchers:
- type = pagerduty
- service = secops
- severity =~ warning|critical
receiver: pd_secops
group_by: ['...']
group_wait: 10s
group_interval: 10s
I am able to trigger several alerts on the associated Alert rule, within a 5m interval, but when I check the AlertManager UI I get only 1 alert. I have also done checks for the PromQL of the Alert rule and it returns as many instances as I have triggered.
My expectation is that when I trigger, say 5 alerts within a 5m interval, I should at most 5 separate alerts in the AlertManager UI, but I am only getting a single alert.
The associated Alert Rule looks like below:
rule {
alert = "AzureTerraformSpnIsUsedOutsideNatGw"
expr = <<EOT
sum(
count_over_time(
{stream="azure-activity-logs"}
| json
| identity_claims_appid = `${var.azure_terraform_xxx}`
| callerIpAddress !~ `${join("|", [for h in var.sre_nat_gw : cidrhost(h, 0)])}`
[10m])
) > 0
EOT
for = "1m"
labels = {
severity = "critical"
source = "azure"
type = "pagerduty"
service = "secops"
}
annotations = {
managed_by = "mycorp/sre"
summary = "sample summmary"
description = "sample description"
runbook_url = "+Runbooks#Audit-alerts"
}
}
}
What am I missing ?