最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

What would be the correct way to create aws cloudwatch alarms with multiple metrics and multiple math expressions using terrafor

programmeradmin0浏览0评论

I am trying to create a cloudwatch alarm for Throughput utilization (%) for an AWS EFS using terraform. When using console to do the same, I found that Throughput utilization (%) is a combination of two separate metrics and is a math expression of 2 other expressions. Please see the picture attached of what I am trying to achieve, [metrics details1

Here is my code,

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  #threshold                 = "80"
  threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []

  metric_query {
    id          = "t1"
    expression  = "((e1)*100)/(e2)"
    label       = "Throughput utilization (%)"
    return_data = "true"
  }

  metric_query {
    id = "m1"
    metric_query {
        id          = "e1"
        expression  = "(m1/1048576)/PERIOD(m1)"
        label       = "Expression1"
        return_data = "true"
    }
    metric {
      metric_name = "MeteredIOBytes"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  metric_query {
    id = "m2"
    metric_query {
        id          = "e2"
        expression  = "m2/1048576"
        label       = "Expression2"
        return_data = "true"
    }
    metric {
      metric_name = "PermittedThroughput"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  alarm_actions       = [var.sns_arn]
  ok_actions          = [var.sns_arn]
  depends_on          = [aws_efs_file_system.efs]
  tags                = localmon_tags
}

It gives me an error saying "Unsupported Block type". I understood that metric_query block type can't be nested under an metric_query block type.

But then here I have to first use 2 expressions, they both should return data and them use the third expression to do math on the results returned from first 2 expressions and use the third as alarm.

I see a note in terraform docs, specifying that You must specify either metric or expression. Not both. But still I am unable to figure out, how or where to mention about e1 & e2

Can someone please guide a little here?

Thanks in advance.

-----------EDIT 1-----------

I modified my code and I am able to create the alarm but still the results are not as expected. Alarms are in Insufficient Data state and the metric data is not showing as expected. Please see the picture attached: metric data post alarm creation

modified code is as follows:

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  threshold                 = "80"
  #threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []

  metric_query {
    id          = "t1"
    expression  = "((e1)*100)/(e2)"
    label       = "Throughput utilization (%)"
    return_data = "true"
  }

  metric_query {
        id          = "e1"
        expression  = "(m1/1048576)/PERIOD(m1)"
        label       = "Expression1"
        #return_data = "true"
  }

  metric_query {
        id          = "e2"
        expression  = "m2/1048576"
        label       = "Expression2"
        #return_data = "true"
  }

  metric_query {
    id = "m1"

    metric {
      metric_name = "MeteredIOBytes"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  metric_query {
    id = "m2"
    
    metric {
      metric_name = "PermittedThroughput"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  alarm_actions       = [var.sns_arn]
  ok_actions          = [var.sns_arn]
  depends_on          = [aws_efs_file_system.efs]
  tags                = localmon_tags
}

I am trying to create a cloudwatch alarm for Throughput utilization (%) for an AWS EFS using terraform. When using console to do the same, I found that Throughput utilization (%) is a combination of two separate metrics and is a math expression of 2 other expressions. Please see the picture attached of what I am trying to achieve, [metrics details1

Here is my code,

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  #threshold                 = "80"
  threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []

  metric_query {
    id          = "t1"
    expression  = "((e1)*100)/(e2)"
    label       = "Throughput utilization (%)"
    return_data = "true"
  }

  metric_query {
    id = "m1"
    metric_query {
        id          = "e1"
        expression  = "(m1/1048576)/PERIOD(m1)"
        label       = "Expression1"
        return_data = "true"
    }
    metric {
      metric_name = "MeteredIOBytes"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  metric_query {
    id = "m2"
    metric_query {
        id          = "e2"
        expression  = "m2/1048576"
        label       = "Expression2"
        return_data = "true"
    }
    metric {
      metric_name = "PermittedThroughput"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  alarm_actions       = [var.sns_arn]
  ok_actions          = [var.sns_arn]
  depends_on          = [aws_efs_file_system.efs]
  tags                = local.common_tags
}

It gives me an error saying "Unsupported Block type". I understood that metric_query block type can't be nested under an metric_query block type.

But then here I have to first use 2 expressions, they both should return data and them use the third expression to do math on the results returned from first 2 expressions and use the third as alarm.

I see a note in terraform docs, specifying that You must specify either metric or expression. Not both. But still I am unable to figure out, how or where to mention about e1 & e2

Can someone please guide a little here?

Thanks in advance.

-----------EDIT 1-----------

I modified my code and I am able to create the alarm but still the results are not as expected. Alarms are in Insufficient Data state and the metric data is not showing as expected. Please see the picture attached: metric data post alarm creation

modified code is as follows:

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  threshold                 = "80"
  #threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []

  metric_query {
    id          = "t1"
    expression  = "((e1)*100)/(e2)"
    label       = "Throughput utilization (%)"
    return_data = "true"
  }

  metric_query {
        id          = "e1"
        expression  = "(m1/1048576)/PERIOD(m1)"
        label       = "Expression1"
        #return_data = "true"
  }

  metric_query {
        id          = "e2"
        expression  = "m2/1048576"
        label       = "Expression2"
        #return_data = "true"
  }

  metric_query {
    id = "m1"

    metric {
      metric_name = "MeteredIOBytes"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  metric_query {
    id = "m2"
    
    metric {
      metric_name = "PermittedThroughput"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  alarm_actions       = [var.sns_arn]
  ok_actions          = [var.sns_arn]
  depends_on          = [aws_efs_file_system.efs]
  tags                = local.common_tags
}
Share Improve this question edited Feb 6 at 18:07 striker asked Feb 5 at 12:05 strikerstriker 14 bronze badges 2
  • I did some more research and modified the code to make it work. But results are not as expected. – striker Commented Feb 5 at 18:03
  • 1 What does DOWNVOTE signifies? – striker Commented Feb 5 at 18:03
Add a comment  | 

1 Answer 1

Reset to default 0

The sequence of how Cloudwatch is displaying your metrics won't change the result. What is important is the dependency order. In your case, t1 has a dependency with e1 and e2, e1 with m1 and e2 with m2 and that won't change.

If you have some Insufficient Data issue, you can configure your alarm to treat your missing data as good with treat_missing_data property like following:

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  threshold                 = "80"
  #threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []
  treat_missing_data         = "notBreaching"

Also, your period of 60 seconds is very small. You can increase it to 5 minutes for example (300 seconds for m1 and m2 period) and see if it works, and you can decrease it afterwards. You want to have a very small period if you need to react quickly, but usually experience is showing that having a bigger period (at least 5 minutes) would give you less false positive alarms and more data to rely on.

Documentation:

  • Configuring how CloudWatch alarms treat missing data: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data

Update:

Maybe you should also add some fillers to unsure that your data are defined, those are very useful especially when using math expressions. In your case, when you have e2 as a denominator, you will need to make sure that if the data is not defined, its value is 1. Numerator could be 0.

Here's the template with FILL expression:

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  threshold                 = "80"
  #threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []
  treat_missing_data  = "notBreaching"

  metric_query {
    id          = "t1"
    expression  = "(FILL(e1, 0) * 100)/FILL(e2, 1)"
    label       = "Throughput utilization (%)"
    return_data = "true"
  }

  metric_query {
        id          = "e1"
        expression  = "(FILL(m1, 0)/1048576)/PERIOD(m1)"
        label       = "Expression1"
        #return_data = "true"
  }

  metric_query {
        id          = "e2"
        expression  = "FILL(m2, 0)/1048576"
        label       = "Expression2"
        #return_data = "true"
  }

  metric_query {
    id = "m1"

    metric {
      metric_name = "MeteredIOBytes"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  metric_query {
    id = "m2"
    
    metric {
      metric_name = "PermittedThroughput"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  alarm_actions       = [var.sns_arn]
  ok_actions          = [var.sns_arn]
  depends_on          = [aws_efs_file_system.efs]
  tags                = local.common_tags
}

Update 2:

Actually, I'm not quite sure what you're trying to do when you divide by PERIOD, but if you're trying to get the average, you should use the Average function instead. Doing a SUM and divide by PERIOD could lead to unexpected results. That would give following version

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  threshold                 = "80"
  #threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []
  treat_missing_data  = "notBreaching"

  metric_query {
    id          = "t1"
    expression  = "(FILL(e1, 0) * 100)/FILL(e2, 1)"
    label       = "Throughput utilization (%)"
    return_data = "true"
  }

  metric_query {
        id          = "e1"
        expression  = "FILL(m1, 0)/1048576"
        label       = "Expression1"
        #return_data = "true"
  }

  metric_query {
        id          = "e2"
        expression  = "FILL(m2, 0)/1048576"
        label       = "Expression2"
        #return_data = "true"
  }

  metric_query {
    id = "m1"

    metric {
      metric_name = "MeteredIOBytes"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Average"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  metric_query {
    id = "m2"
    
    metric {
      metric_name = "PermittedThroughput"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Average"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  alarm_actions       = [var.sns_arn]
  ok_actions          = [var.sns_arn]
  depends_on          = [aws_efs_file_system.efs]
  tags                = local.common_tags
}

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论