I believe I have some sort of logic issue with my auto-scaling module that is responsible for scaling my ECS tasks.
As you can see, I use dynamic scaling that's based on the resources consumption, scheduled scaling that is based on Cron hours, and SQS scaling based on queue length.
My main issue is with the dynamic scaling in combination with scheduled scaling.
For example, the customer has demanded that everyday at 8:00AM, we will pre-launch 50 instances of a specific service. Scheduled scaling works fine and we get 50 instances of said task.
The problem is that my dynamic scaling sees that these tasks are under-utilized and begins shutting them down, and about half an hour later we get overwhelmed by traffic and the services begin to crash.
I am attaching my module configuration, most values are set outside but it should give you an idea of how it works.
I'm just trying to better understand if I missed something, or if the whole logic I made is not correct.
Thanks ahead to anyone who's able to assist.
resource "aws_appautoscaling_target" "auto_scaling_target" {
for_each = varpute_services_auto_scaling_configuration
max_capacity = each.value.instance_max_capacity
min_capacity = each.value.instance_min_capacity
resource_id = "service/${var.cluster_name}/${each.key}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "auto_scaling_memory_policy" {
for_each = varpute_services_auto_scaling_configuration
name = "${each.key}_auto_scaling_memory_policy"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.auto_scaling_target[each.key].resource_id
scalable_dimension = aws_appautoscaling_target.auto_scaling_target[each.key].scalable_dimension
service_namespace = aws_appautoscaling_target.auto_scaling_target[each.key].service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageMemoryUtilization"
}
target_value = each.value.memory_scaling_target_value
}
}
resource "aws_appautoscaling_policy" "auto_scaling_cpu_policy" {
for_each = varpute_services_auto_scaling_configuration
name = "${each.key}_auto_scaling_cpu_policy"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.auto_scaling_target[each.key].resource_id
scalable_dimension = aws_appautoscaling_target.auto_scaling_target[each.key].scalable_dimension
service_namespace = aws_appautoscaling_target.auto_scaling_target[each.key].service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = each.value.cpu_scaling_target_value
}
}
resource "aws_appautoscaling_policy" "auto_scaling_sqs_policy" {
for_each = var.sqs_based_auto_scaling_configuration
name = "${each.key}_auto_scaling_sqs_policy"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.auto_scaling_target[each.key].resource_id
scalable_dimension = aws_appautoscaling_target.auto_scaling_target[each.key].scalable_dimension
service_namespace = aws_appautoscaling_target.auto_scaling_target[each.key].service_namespace
target_tracking_scaling_policy_configuration {
customized_metric_specification {
metric_name = "ApproximateNumberOfMessagesVisible"
namespace = "AWS/SQS"
statistic = "Average"
dimensions {
name = "QueueName"
value = each.value.queue_name
}
}
target_value = each.value.sqs_scaling_target_value
scale_in_cooldown = 300
scale_out_cooldown = 300
}
}
locals {
flattened_scheduled_scaling_actions = flatten([
for service, actions in varpute_services_scheduled_scaling : [
for idx, action in actions : {
action_name = "${service}_scheduled_scaling_${idx + 1}"
service_name = service
schedule = action.schedule_expression
desired_count = action.scalable_action.desired_count
}
]
])
}
resource "aws_appautoscaling_scheduled_action" "scheduled_scaling" {
for_each = { for action in local.flattened_scheduled_scaling_actions : action.action_name => action }
service_namespace = "ecs"
resource_id = "service/${var.cluster_name}/${each.value.service_name}"
scalable_dimension = "ecs:service:DesiredCount"
name = each.value.action_name
schedule = each.value.schedule
scalable_target_action {
min_capacity = each.value.desired_count
max_capacity = each.value.desired_count
}
}
I believe I have some sort of logic issue with my auto-scaling module that is responsible for scaling my ECS tasks.
As you can see, I use dynamic scaling that's based on the resources consumption, scheduled scaling that is based on Cron hours, and SQS scaling based on queue length.
My main issue is with the dynamic scaling in combination with scheduled scaling.
For example, the customer has demanded that everyday at 8:00AM, we will pre-launch 50 instances of a specific service. Scheduled scaling works fine and we get 50 instances of said task.
The problem is that my dynamic scaling sees that these tasks are under-utilized and begins shutting them down, and about half an hour later we get overwhelmed by traffic and the services begin to crash.
I am attaching my module configuration, most values are set outside but it should give you an idea of how it works.
I'm just trying to better understand if I missed something, or if the whole logic I made is not correct.
Thanks ahead to anyone who's able to assist.
resource "aws_appautoscaling_target" "auto_scaling_target" {
for_each = varpute_services_auto_scaling_configuration
max_capacity = each.value.instance_max_capacity
min_capacity = each.value.instance_min_capacity
resource_id = "service/${var.cluster_name}/${each.key}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "auto_scaling_memory_policy" {
for_each = varpute_services_auto_scaling_configuration
name = "${each.key}_auto_scaling_memory_policy"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.auto_scaling_target[each.key].resource_id
scalable_dimension = aws_appautoscaling_target.auto_scaling_target[each.key].scalable_dimension
service_namespace = aws_appautoscaling_target.auto_scaling_target[each.key].service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageMemoryUtilization"
}
target_value = each.value.memory_scaling_target_value
}
}
resource "aws_appautoscaling_policy" "auto_scaling_cpu_policy" {
for_each = varpute_services_auto_scaling_configuration
name = "${each.key}_auto_scaling_cpu_policy"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.auto_scaling_target[each.key].resource_id
scalable_dimension = aws_appautoscaling_target.auto_scaling_target[each.key].scalable_dimension
service_namespace = aws_appautoscaling_target.auto_scaling_target[each.key].service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = each.value.cpu_scaling_target_value
}
}
resource "aws_appautoscaling_policy" "auto_scaling_sqs_policy" {
for_each = var.sqs_based_auto_scaling_configuration
name = "${each.key}_auto_scaling_sqs_policy"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.auto_scaling_target[each.key].resource_id
scalable_dimension = aws_appautoscaling_target.auto_scaling_target[each.key].scalable_dimension
service_namespace = aws_appautoscaling_target.auto_scaling_target[each.key].service_namespace
target_tracking_scaling_policy_configuration {
customized_metric_specification {
metric_name = "ApproximateNumberOfMessagesVisible"
namespace = "AWS/SQS"
statistic = "Average"
dimensions {
name = "QueueName"
value = each.value.queue_name
}
}
target_value = each.value.sqs_scaling_target_value
scale_in_cooldown = 300
scale_out_cooldown = 300
}
}
locals {
flattened_scheduled_scaling_actions = flatten([
for service, actions in varpute_services_scheduled_scaling : [
for idx, action in actions : {
action_name = "${service}_scheduled_scaling_${idx + 1}"
service_name = service
schedule = action.schedule_expression
desired_count = action.scalable_action.desired_count
}
]
])
}
resource "aws_appautoscaling_scheduled_action" "scheduled_scaling" {
for_each = { for action in local.flattened_scheduled_scaling_actions : action.action_name => action }
service_namespace = "ecs"
resource_id = "service/${var.cluster_name}/${each.value.service_name}"
scalable_dimension = "ecs:service:DesiredCount"
name = each.value.action_name
schedule = each.value.schedule
scalable_target_action {
min_capacity = each.value.desired_count
max_capacity = each.value.desired_count
}
}
Share
Improve this question
asked Jan 30 at 10:20
AssafAssaf
699 bronze badges
1 Answer
Reset to default 0One of the options to fix this is use a Cooldown Period for Dynamic Scaling
Scale-out cooldown: 60s (react quickly to traffic)
Scale-in cooldown: 3600s (60 min)
To set this in AWS CLI:
aws application-autoscaling put-scaling-policy \
--policy-name "ScaleInCooldownPolicy" \
--service-namespace ecs \
--resource-id service/YOUR-CLUSTER-NAME/YOUR-SERVICE-NAME \
--scalable-dimension ecs:service:DesiredCount \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 50.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"ScaleInCooldown": 3600,
"ScaleOutCooldown": 60
}'