What would be the correct way to create aws cloudwatch alarms with multiple metrics and multiple math expressions using terrafor

I am trying to create a cloudwatch alarm for Throughput utilization (%) for an AWS EFS using terraform. When using console to do the same, I found that Throughput utilization (%) is a combination of two separate metrics and is a math expression of 2 other expressions. Please see the picture attached of what I am trying to achieve, [metrics details1

Here is my code,

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  #threshold                 = "80"
  threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []

  metric_query {
    id          = "t1"
    expression  = "((e1)*100)/(e2)"
    label       = "Throughput utilization (%)"
    return_data = "true"
  }

  metric_query {
    id = "m1"
    metric_query {
        id          = "e1"
        expression  = "(m1/1048576)/PERIOD(m1)"
        label       = "Expression1"
        return_data = "true"
    }
    metric {
      metric_name = "MeteredIOBytes"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  metric_query {
    id = "m2"
    metric_query {
        id          = "e2"
        expression  = "m2/1048576"
        label       = "Expression2"
        return_data = "true"
    }
    metric {
      metric_name = "PermittedThroughput"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  alarm_actions       = [var.sns_arn]
  ok_actions          = [var.sns_arn]
  depends_on          = [aws_efs_file_system.efs]
  tags                = localmon_tags
}

It gives me an error saying "Unsupported Block type". I understood that metric_query block type can't be nested under an metric_query block type.

But then here I have to first use 2 expressions, they both should return data and them use the third expression to do math on the results returned from first 2 expressions and use the third as alarm.

I see a note in terraform docs, specifying that You must specify either metric or expression. Not both. But still I am unable to figure out, how or where to mention about e1 & e2

Can someone please guide a little here?

Thanks in advance.

-----------EDIT 1-----------

I modified my code and I am able to create the alarm but still the results are not as expected. Alarms are in Insufficient Data state and the metric data is not showing as expected. Please see the picture attached: metric data post alarm creation

modified code is as follows:

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  threshold                 = "80"
  #threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []

  metric_query {
    id          = "t1"
    expression  = "((e1)*100)/(e2)"
    label       = "Throughput utilization (%)"
    return_data = "true"
  }

  metric_query {
        id          = "e1"
        expression  = "(m1/1048576)/PERIOD(m1)"
        label       = "Expression1"
        #return_data = "true"
  }

  metric_query {
        id          = "e2"
        expression  = "m2/1048576"
        label       = "Expression2"
        #return_data = "true"
  }

  metric_query {
    id = "m1"

    metric {
      metric_name = "MeteredIOBytes"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  metric_query {
    id = "m2"
    
    metric {
      metric_name = "PermittedThroughput"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  alarm_actions       = [var.sns_arn]
  ok_actions          = [var.sns_arn]
  depends_on          = [aws_efs_file_system.efs]
  tags                = localmon_tags
}

Here is my code,

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  #threshold                 = "80"
  threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []

  metric_query {
    id          = "t1"
    expression  = "((e1)*100)/(e2)"
    label       = "Throughput utilization (%)"
    return_data = "true"
  }

  metric_query {
    id = "m1"
    metric_query {
        id          = "e1"
        expression  = "(m1/1048576)/PERIOD(m1)"
        label       = "Expression1"
        return_data = "true"
    }
    metric {
      metric_name = "MeteredIOBytes"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  metric_query {
    id = "m2"
    metric_query {
        id          = "e2"
        expression  = "m2/1048576"
        label       = "Expression2"
        return_data = "true"
    }
    metric {
      metric_name = "PermittedThroughput"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  alarm_actions       = [var.sns_arn]
  ok_actions          = [var.sns_arn]
  depends_on          = [aws_efs_file_system.efs]
  tags                = local.common_tags
}

It gives me an error saying "Unsupported Block type". I understood that metric_query block type can't be nested under an metric_query block type.

But then here I have to first use 2 expressions, they both should return data and them use the third expression to do math on the results returned from first 2 expressions and use the third as alarm.

I see a note in terraform docs, specifying that You must specify either metric or expression. Not both. But still I am unable to figure out, how or where to mention about e1 & e2

Can someone please guide a little here?

Thanks in advance.

-----------EDIT 1-----------

modified code is as follows:

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  threshold                 = "80"
  #threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []

  metric_query {
    id          = "t1"
    expression  = "((e1)*100)/(e2)"
    label       = "Throughput utilization (%)"
    return_data = "true"
  }

  metric_query {
        id          = "e1"
        expression  = "(m1/1048576)/PERIOD(m1)"
        label       = "Expression1"
        #return_data = "true"
  }

  metric_query {
        id          = "e2"
        expression  = "m2/1048576"
        label       = "Expression2"
        #return_data = "true"
  }

  metric_query {
    id = "m1"

    metric {
      metric_name = "MeteredIOBytes"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  metric_query {
    id = "m2"
    
    metric {
      metric_name = "PermittedThroughput"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  alarm_actions       = [var.sns_arn]
  ok_actions          = [var.sns_arn]
  depends_on          = [aws_efs_file_system.efs]
  tags                = local.common_tags
}

Share Improve this question edited Feb 6 at 18:07 asked Feb 5 at 12:05 striker 14 bronze badges

I did some more research and modified the code to make it work. But results are not as expected. – striker Commented Feb 5 at 18:03
1 What does DOWNVOTE signifies? – striker Commented Feb 5 at 18:03

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

The sequence of how Cloudwatch is displaying your metrics won't change the result. What is important is the dependency order. In your case, t1 has a dependency with e1 and e2, e1 with m1 and e2 with m2 and that won't change.

If you have some Insufficient Data issue, you can configure your alarm to treat your missing data as good with treat_missing_data property like following:

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  threshold                 = "80"
  #threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []
  treat_missing_data         = "notBreaching"

Also, your period of 60 seconds is very small. You can increase it to 5 minutes for example (300 seconds for m1 and m2 period) and see if it works, and you can decrease it afterwards. You want to have a very small period if you need to react quickly, but usually experience is showing that having a bigger period (at least 5 minutes) would give you less false positive alarms and more data to rely on.

Documentation:

Configuring how CloudWatch alarms treat missing data: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data

Update:

Maybe you should also add some fillers to unsure that your data are defined, those are very useful especially when using math expressions. In your case, when you have e2 as a denominator, you will need to make sure that if the data is not defined, its value is 1. Numerator could be 0.

Here's the template with FILL expression:

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  threshold                 = "80"
  #threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []
  treat_missing_data  = "notBreaching"

  metric_query {
    id          = "t1"
    expression  = "(FILL(e1, 0) * 100)/FILL(e2, 1)"
    label       = "Throughput utilization (%)"
    return_data = "true"
  }

  metric_query {
        id          = "e1"
        expression  = "(FILL(m1, 0)/1048576)/PERIOD(m1)"
        label       = "Expression1"
        #return_data = "true"
  }

  metric_query {
        id          = "e2"
        expression  = "FILL(m2, 0)/1048576"
        label       = "Expression2"
        #return_data = "true"
  }

  metric_query {
    id = "m1"

    metric {
      metric_name = "MeteredIOBytes"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  metric_query {
    id = "m2"
    
    metric {
      metric_name = "PermittedThroughput"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Sum"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  alarm_actions       = [var.sns_arn]
  ok_actions          = [var.sns_arn]
  depends_on          = [aws_efs_file_system.efs]
  tags                = local.common_tags
}

Update 2:

Actually, I'm not quite sure what you're trying to do when you divide by PERIOD, but if you're trying to get the average, you should use the Average function instead. Doing a SUM and divide by PERIOD could lead to unexpected results. That would give following version

resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
  alarm_name                = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = "1"
  threshold                 = "80"
  #threshold_metric_id       = "t1"
  alarm_description         = "Throughput Utilization has exceeded 80%"
  #insufficient_data_actions = []
  treat_missing_data  = "notBreaching"

  metric_query {
    id          = "t1"
    expression  = "(FILL(e1, 0) * 100)/FILL(e2, 1)"
    label       = "Throughput utilization (%)"
    return_data = "true"
  }

  metric_query {
        id          = "e1"
        expression  = "FILL(m1, 0)/1048576"
        label       = "Expression1"
        #return_data = "true"
  }

  metric_query {
        id          = "e2"
        expression  = "FILL(m2, 0)/1048576"
        label       = "Expression2"
        #return_data = "true"
  }

  metric_query {
    id = "m1"

    metric {
      metric_name = "MeteredIOBytes"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Average"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  metric_query {
    id = "m2"
    
    metric {
      metric_name = "PermittedThroughput"
      namespace   = "AWS/EFS"
      period      = "60"
      stat        = "Average"
      unit        = "Count"

      dimensions = {
        FileSystemId    = aws_efs_file_system.efs.id
      }
    }
  }

  alarm_actions       = [var.sns_arn]
  ok_actions          = [var.sns_arn]
  depends_on          = [aws_efs_file_system.efs]
  tags                = local.common_tags
}

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

What would be the correct way to create aws cloudwatch alarms with multiple metrics and multiple math expressions using terrafor

1 Answer 1

与本文相关的文章

评论列表(0)