I am trying to create a cloudwatch alarm for Throughput utilization (%) for an AWS EFS using terraform. When using console to do the same, I found that Throughput utilization (%) is a combination of two separate metrics and is a math expression of 2 other expressions. Please see the picture attached of what I am trying to achieve, [metrics details1
Here is my code,
resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
alarm_name = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
#threshold = "80"
threshold_metric_id = "t1"
alarm_description = "Throughput Utilization has exceeded 80%"
#insufficient_data_actions = []
metric_query {
id = "t1"
expression = "((e1)*100)/(e2)"
label = "Throughput utilization (%)"
return_data = "true"
}
metric_query {
id = "m1"
metric_query {
id = "e1"
expression = "(m1/1048576)/PERIOD(m1)"
label = "Expression1"
return_data = "true"
}
metric {
metric_name = "MeteredIOBytes"
namespace = "AWS/EFS"
period = "60"
stat = "Sum"
unit = "Count"
dimensions = {
FileSystemId = aws_efs_file_system.efs.id
}
}
}
metric_query {
id = "m2"
metric_query {
id = "e2"
expression = "m2/1048576"
label = "Expression2"
return_data = "true"
}
metric {
metric_name = "PermittedThroughput"
namespace = "AWS/EFS"
period = "60"
stat = "Sum"
unit = "Count"
dimensions = {
FileSystemId = aws_efs_file_system.efs.id
}
}
}
alarm_actions = [var.sns_arn]
ok_actions = [var.sns_arn]
depends_on = [aws_efs_file_system.efs]
tags = localmon_tags
}
It gives me an error saying "Unsupported Block type". I understood that metric_query block type can't be nested under an metric_query block type.
But then here I have to first use 2 expressions, they both should return data and them use the third expression to do math on the results returned from first 2 expressions and use the third as alarm.
I see a note in terraform docs, specifying that You must specify either metric or expression. Not both. But still I am unable to figure out, how or where to mention about e1 & e2
Can someone please guide a little here?
Thanks in advance.
-----------EDIT 1-----------
I modified my code and I am able to create the alarm but still the results are not as expected. Alarms are in Insufficient Data state and the metric data is not showing as expected. Please see the picture attached: metric data post alarm creation
modified code is as follows:
resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
alarm_name = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
threshold = "80"
#threshold_metric_id = "t1"
alarm_description = "Throughput Utilization has exceeded 80%"
#insufficient_data_actions = []
metric_query {
id = "t1"
expression = "((e1)*100)/(e2)"
label = "Throughput utilization (%)"
return_data = "true"
}
metric_query {
id = "e1"
expression = "(m1/1048576)/PERIOD(m1)"
label = "Expression1"
#return_data = "true"
}
metric_query {
id = "e2"
expression = "m2/1048576"
label = "Expression2"
#return_data = "true"
}
metric_query {
id = "m1"
metric {
metric_name = "MeteredIOBytes"
namespace = "AWS/EFS"
period = "60"
stat = "Sum"
unit = "Count"
dimensions = {
FileSystemId = aws_efs_file_system.efs.id
}
}
}
metric_query {
id = "m2"
metric {
metric_name = "PermittedThroughput"
namespace = "AWS/EFS"
period = "60"
stat = "Sum"
unit = "Count"
dimensions = {
FileSystemId = aws_efs_file_system.efs.id
}
}
}
alarm_actions = [var.sns_arn]
ok_actions = [var.sns_arn]
depends_on = [aws_efs_file_system.efs]
tags = localmon_tags
}
I am trying to create a cloudwatch alarm for Throughput utilization (%) for an AWS EFS using terraform. When using console to do the same, I found that Throughput utilization (%) is a combination of two separate metrics and is a math expression of 2 other expressions. Please see the picture attached of what I am trying to achieve, [metrics details1
Here is my code,
resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
alarm_name = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
#threshold = "80"
threshold_metric_id = "t1"
alarm_description = "Throughput Utilization has exceeded 80%"
#insufficient_data_actions = []
metric_query {
id = "t1"
expression = "((e1)*100)/(e2)"
label = "Throughput utilization (%)"
return_data = "true"
}
metric_query {
id = "m1"
metric_query {
id = "e1"
expression = "(m1/1048576)/PERIOD(m1)"
label = "Expression1"
return_data = "true"
}
metric {
metric_name = "MeteredIOBytes"
namespace = "AWS/EFS"
period = "60"
stat = "Sum"
unit = "Count"
dimensions = {
FileSystemId = aws_efs_file_system.efs.id
}
}
}
metric_query {
id = "m2"
metric_query {
id = "e2"
expression = "m2/1048576"
label = "Expression2"
return_data = "true"
}
metric {
metric_name = "PermittedThroughput"
namespace = "AWS/EFS"
period = "60"
stat = "Sum"
unit = "Count"
dimensions = {
FileSystemId = aws_efs_file_system.efs.id
}
}
}
alarm_actions = [var.sns_arn]
ok_actions = [var.sns_arn]
depends_on = [aws_efs_file_system.efs]
tags = local.common_tags
}
It gives me an error saying "Unsupported Block type". I understood that metric_query block type can't be nested under an metric_query block type.
But then here I have to first use 2 expressions, they both should return data and them use the third expression to do math on the results returned from first 2 expressions and use the third as alarm.
I see a note in terraform docs, specifying that You must specify either metric or expression. Not both. But still I am unable to figure out, how or where to mention about e1 & e2
Can someone please guide a little here?
Thanks in advance.
-----------EDIT 1-----------
I modified my code and I am able to create the alarm but still the results are not as expected. Alarms are in Insufficient Data state and the metric data is not showing as expected. Please see the picture attached: metric data post alarm creation
modified code is as follows:
resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
alarm_name = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
threshold = "80"
#threshold_metric_id = "t1"
alarm_description = "Throughput Utilization has exceeded 80%"
#insufficient_data_actions = []
metric_query {
id = "t1"
expression = "((e1)*100)/(e2)"
label = "Throughput utilization (%)"
return_data = "true"
}
metric_query {
id = "e1"
expression = "(m1/1048576)/PERIOD(m1)"
label = "Expression1"
#return_data = "true"
}
metric_query {
id = "e2"
expression = "m2/1048576"
label = "Expression2"
#return_data = "true"
}
metric_query {
id = "m1"
metric {
metric_name = "MeteredIOBytes"
namespace = "AWS/EFS"
period = "60"
stat = "Sum"
unit = "Count"
dimensions = {
FileSystemId = aws_efs_file_system.efs.id
}
}
}
metric_query {
id = "m2"
metric {
metric_name = "PermittedThroughput"
namespace = "AWS/EFS"
period = "60"
stat = "Sum"
unit = "Count"
dimensions = {
FileSystemId = aws_efs_file_system.efs.id
}
}
}
alarm_actions = [var.sns_arn]
ok_actions = [var.sns_arn]
depends_on = [aws_efs_file_system.efs]
tags = local.common_tags
}
Share
Improve this question
edited Feb 6 at 18:07
striker
asked Feb 5 at 12:05
strikerstriker
14 bronze badges
2
- I did some more research and modified the code to make it work. But results are not as expected. – striker Commented Feb 5 at 18:03
- 1 What does DOWNVOTE signifies? – striker Commented Feb 5 at 18:03
1 Answer
Reset to default 0The sequence of how Cloudwatch is displaying your metrics won't change the result. What is important is the dependency order. In your case, t1
has a dependency with e1
and e2
, e1
with m1
and e2
with m2
and that won't change.
If you have some Insufficient Data issue, you can configure your alarm to treat your missing data as good with treat_missing_data
property like following:
resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
alarm_name = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
threshold = "80"
#threshold_metric_id = "t1"
alarm_description = "Throughput Utilization has exceeded 80%"
#insufficient_data_actions = []
treat_missing_data = "notBreaching"
Also, your period of 60 seconds is very small. You can increase it to 5 minutes for example (300 seconds for m1
and m2
period) and see if it works, and you can decrease it afterwards. You want to have a very small period if you need to react quickly, but usually experience is showing that having a bigger period (at least 5 minutes) would give you less false positive alarms and more data to rely on.
Documentation:
- Configuring how CloudWatch alarms treat missing data: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data
Update:
Maybe you should also add some fillers to unsure that your data are defined, those are very useful especially when using math expressions. In your case, when you have e2
as a denominator, you will need to make sure that if the data is not defined, its value is 1
. Numerator could be 0
.
Here's the template with FILL
expression:
resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
alarm_name = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
threshold = "80"
#threshold_metric_id = "t1"
alarm_description = "Throughput Utilization has exceeded 80%"
#insufficient_data_actions = []
treat_missing_data = "notBreaching"
metric_query {
id = "t1"
expression = "(FILL(e1, 0) * 100)/FILL(e2, 1)"
label = "Throughput utilization (%)"
return_data = "true"
}
metric_query {
id = "e1"
expression = "(FILL(m1, 0)/1048576)/PERIOD(m1)"
label = "Expression1"
#return_data = "true"
}
metric_query {
id = "e2"
expression = "FILL(m2, 0)/1048576"
label = "Expression2"
#return_data = "true"
}
metric_query {
id = "m1"
metric {
metric_name = "MeteredIOBytes"
namespace = "AWS/EFS"
period = "60"
stat = "Sum"
unit = "Count"
dimensions = {
FileSystemId = aws_efs_file_system.efs.id
}
}
}
metric_query {
id = "m2"
metric {
metric_name = "PermittedThroughput"
namespace = "AWS/EFS"
period = "60"
stat = "Sum"
unit = "Count"
dimensions = {
FileSystemId = aws_efs_file_system.efs.id
}
}
}
alarm_actions = [var.sns_arn]
ok_actions = [var.sns_arn]
depends_on = [aws_efs_file_system.efs]
tags = local.common_tags
}
Update 2:
Actually, I'm not quite sure what you're trying to do when you divide by PERIOD
, but if you're trying to get the average, you should use the Average
function instead. Doing a SUM
and divide by PERIOD
could lead to unexpected results. That would give following version
resource "aws_cloudwatch_metric_alarm" "percent_throughput_utilization" {
alarm_name = "EFS_${var.efs_name}_percent_throughput_utilization_too_high"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "1"
threshold = "80"
#threshold_metric_id = "t1"
alarm_description = "Throughput Utilization has exceeded 80%"
#insufficient_data_actions = []
treat_missing_data = "notBreaching"
metric_query {
id = "t1"
expression = "(FILL(e1, 0) * 100)/FILL(e2, 1)"
label = "Throughput utilization (%)"
return_data = "true"
}
metric_query {
id = "e1"
expression = "FILL(m1, 0)/1048576"
label = "Expression1"
#return_data = "true"
}
metric_query {
id = "e2"
expression = "FILL(m2, 0)/1048576"
label = "Expression2"
#return_data = "true"
}
metric_query {
id = "m1"
metric {
metric_name = "MeteredIOBytes"
namespace = "AWS/EFS"
period = "60"
stat = "Average"
unit = "Count"
dimensions = {
FileSystemId = aws_efs_file_system.efs.id
}
}
}
metric_query {
id = "m2"
metric {
metric_name = "PermittedThroughput"
namespace = "AWS/EFS"
period = "60"
stat = "Average"
unit = "Count"
dimensions = {
FileSystemId = aws_efs_file_system.efs.id
}
}
}
alarm_actions = [var.sns_arn]
ok_actions = [var.sns_arn]
depends_on = [aws_efs_file_system.efs]
tags = local.common_tags
}