最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

How can I ungroup alerts in Mimir AlertManager - Stack Overflow

programmeradmin1浏览0评论

I have the following configuration (i.e. group_by: ['...']) to ungroup alerts but it is not working:

global:
      pagerduty_url: 
      resolve_timeout: 5m
      smtp_from: [email protected]
      smtp_require_tls: false
      smtp_smarthost: xxx.zw.corp:25
    inhibit_rules:
      - equal: ['alertname']
        source_matchers:
          - severity = critical
        target_matchers:
          - severity = warning
      - equal: ['namespace']
        source_matchers:
          - severity = warning
        target_matchers:
          - severity = info
      - source_matchers:
          - alertname = InfoInhibitor
        target_matchers:
          - severity = info
      - equal: ["host"]
        source_matchers:
          - alertname = HostDown
        target_matchers:
    receivers:
      - name: 'null'
      - name: pd_secops # 
        pagerduty_configs:
          - details:
            runbook_url: "{{ .CommonAnnotations.runbook_url }}"
            routing_key: "cd1xxxxx"
            severity: "{{ .CommonLabels.severity }}"
            send_resolved: false
      - name: 'email'
        email_configs:
          - to: '{{ .CommonLabels.email }}'
            from: '[email protected]'
            smarthost: 'xxx.zw.corp:25'
            require_tls: false
    route:
      group_by: ['alertname']
      group_interval: 2m
      group_wait: 1m
      receiver: 'null'
      repeat_interval: 8736h
      routes:
        - matchers:
            - alertname = Watchdog
          receiver: 'null'
        - matchers:
            - alertname = InfoInhibitor
          receiver: 'null'
        - matchers:
            - type = pagerduty
            - service = secops
            - severity =~ warning|critical
          receiver: pd_secops
          group_by: ['...']
          group_wait: 10s
          group_interval: 10s

I am able to trigger several alerts on the associated Alert rule, within a 5m interval, but when I check the AlertManager UI I get only 1 alert. I have also done checks for the PromQL of the Alert rule and it returns as many instances as I have triggered.

My expectation is that when I trigger, say 5 alerts within a 5m interval, I should at most 5 separate alerts in the AlertManager UI, but I am only getting a single alert.

The associated Alert Rule looks like below:

  rule {
alert = "AzureTerraformSpnIsUsedOutsideNatGw"
expr  = <<EOT
sum(
  count_over_time(
    {stream="azure-activity-logs"}
    | json
    | identity_claims_appid = `${var.azure_terraform_xxx}`
    | callerIpAddress !~ `${join("|", [for h in var.sre_nat_gw : cidrhost(h, 0)])}`
  [10m])
) > 0

EOT
    for   = "1m"
    labels = {
      severity = "critical"
      source   = "azure"
      type     = "pagerduty"
      service  = "secops"
    }
    annotations = {
      managed_by  = "mycorp/sre"
      summary     = "sample summmary"
      description = "sample description"
      runbook_url = "+Runbooks#Audit-alerts"
    }
  }
}

What am I missing ?

发布评论

评论列表(0)

  1. 暂无评论