node.js - 502 errors in ECS ALB

I have nodejs aplication with nestjs. There a different endpoints, some of them respond quickly, and others could run for tens of seconds.

Architecture in aws: waf -> cloudfront -> alb -> ecs

From time to time logs in cloudfront is returning 502 errors. Logs of application in ecs don't have any information about such errors. ALB access logs also show no errors, but chart ELB 502s indicates them. I have also enabled vpc flow logs, but there is no useful information.

I found headers of the response look like this:

content-length: 524 content-type: text/html date: Tue, 11 Feb 2025 10:26:11 GMT server: awselb/2.0 set-cookie: AWSALB=lCYCp5ugtxpxDjRdGAr5UnvJWPVrLoFPvc43vx/GPelpQQWOkH5DPjZQs/waOiwfHarsAF7PQJOM/lHiZjLVnHBOOyT/QGAsk/+Xu5XhXs7sj4dbbdAyrwAS0u38; Expires=Tue, 18 Feb 2025 10:26:11 GMT; Path=/ set-cookie: AWSALBCORS=lCYCp5ugtxpxDjRdGAr5UnvJWPVrLoFPvc43vx/GPelpQQWOkH5DPjZQs/waOiwfHarsAF7PQJOM/lHiZjLVnHBOOyT/QGAsk/+Xu5XhXs7sj4dbbdAyrwAS0u38; Expires=Tue, 18 Feb 2025 10:26:11 GMT; Path=/; SameSite=None via: 1.1 38f46facdae93530546676e451869f4c.cloudfront (CloudFront) x-amz-cf-id: oN-OzKJyuQo_DZrQJmHjUuza52zSWDxBCiBFFpcZY0PVMwl2-JyDNg== x-amz-cf-pop: MUC50-P5 x-cache: Error from cloudfront

ALB attributes:

TLS version and cipher headers: Off
WAF fail open: Off
HTTP/2: On
Connection idle timeout: 60 seconds
HTTP client keepalive duration: 3600 seconds
Desync mitigation mode: Defensive
Drop invalid header fields: Off
X-Forwarded-For header: Append
Client port preservation: Off
Preserve host header: Off

Target group health-check:

Protocol: HTTP
Path: /api/health-check
Port: Traffic port
Healthy threshold: 2 consecutive health check successes
Unhealthy threshold: 3 consecutive health check failures
Timeout: 6 seconds
Interval: 15 seconds
Success codes: 200-299,404

Target group attributes:

Deregistration delay (draining interval): 45 seconds
Load balancing algorithm: Round robin
Slow start duration: 0 seconds
Stickiness: Off
Cross-zone load balancing: Inherit settings from load balancer attributes
DNS – Healthy state requirements
Minimum healthy target count: 1
Minimum healthy target percentage: off
Routing - Healthy state requirements
Minimum healthy target count: 1
Minimum healthy target percentage: off

Assumption 1: Problem with stickiness of ALB.

Action: Turned off stickiness

Result: Didn't help

Assumption 2. Spikes with CPU or Memory

Action: Didn't find any issues with memory. But found CPU spikes up to 100% of MAX CPU utilisation. Increased quantity of running tasks. MAX CPU utilisation lowered to 20-30%

Result: Didn't help

Assumption 3. Issue with health checks

Action: Checked that for all period I didn't see that target groups has been ever in unhealthy status. Increase timeout and interval and deregistration delay.

Result: Didn't help

Assumption 4. Issues with ALB configuration

Action: Enabled access logs. But didn't find any info

Result: Didn't help

Assumption 5. In stack overflow found that root cause could be --max-http-header-size in nodejs

Action: Increased value to 16kb

Result: Didn't help

Will be so happy to hear any suggestion to debug or resolve this issue. Thanks!

I have nodejs aplication with nestjs. There a different endpoints, some of them respond quickly, and others could run for tens of seconds.

Architecture in aws: waf -> cloudfront -> alb -> ecs

I found headers of the response look like this:

content-length: 524 content-type: text/html date: Tue, 11 Feb 2025 10:26:11 GMT server: awselb/2.0 set-cookie: AWSALB=lCYCp5ugtxpxDjRdGAr5UnvJWPVrLoFPvc43vx/GPelpQQWOkH5DPjZQs/waOiwfHarsAF7PQJOM/lHiZjLVnHBOOyT/QGAsk/+Xu5XhXs7sj4dbbdAyrwAS0u38; Expires=Tue, 18 Feb 2025 10:26:11 GMT; Path=/ set-cookie: AWSALBCORS=lCYCp5ugtxpxDjRdGAr5UnvJWPVrLoFPvc43vx/GPelpQQWOkH5DPjZQs/waOiwfHarsAF7PQJOM/lHiZjLVnHBOOyT/QGAsk/+Xu5XhXs7sj4dbbdAyrwAS0u38; Expires=Tue, 18 Feb 2025 10:26:11 GMT; Path=/; SameSite=None via: 1.1 38f46facdae93530546676e451869f4c.cloudfront (CloudFront) x-amz-cf-id: oN-OzKJyuQo_DZrQJmHjUuza52zSWDxBCiBFFpcZY0PVMwl2-JyDNg== x-amz-cf-pop: MUC50-P5 x-cache: Error from cloudfront

ALB attributes:

TLS version and cipher headers: Off
WAF fail open: Off
HTTP/2: On
Connection idle timeout: 60 seconds
HTTP client keepalive duration: 3600 seconds
Desync mitigation mode: Defensive
Drop invalid header fields: Off
X-Forwarded-For header: Append
Client port preservation: Off
Preserve host header: Off

Target group health-check:

Protocol: HTTP
Path: /api/health-check
Port: Traffic port
Healthy threshold: 2 consecutive health check successes
Unhealthy threshold: 3 consecutive health check failures
Timeout: 6 seconds
Interval: 15 seconds
Success codes: 200-299,404

Target group attributes:

Deregistration delay (draining interval): 45 seconds
Load balancing algorithm: Round robin
Slow start duration: 0 seconds
Stickiness: Off
Cross-zone load balancing: Inherit settings from load balancer attributes
DNS – Healthy state requirements
Minimum healthy target count: 1
Minimum healthy target percentage: off
Routing - Healthy state requirements
Minimum healthy target count: 1
Minimum healthy target percentage: off

Assumption 1: Problem with stickiness of ALB.

Action: Turned off stickiness

Result: Didn't help

Assumption 2. Spikes with CPU or Memory

Action: Didn't find any issues with memory. But found CPU spikes up to 100% of MAX CPU utilisation. Increased quantity of running tasks. MAX CPU utilisation lowered to 20-30%

Result: Didn't help

Assumption 3. Issue with health checks

Action: Checked that for all period I didn't see that target groups has been ever in unhealthy status. Increase timeout and interval and deregistration delay.

Result: Didn't help

Assumption 4. Issues with ALB configuration

Action: Enabled access logs. But didn't find any info

Result: Didn't help

Assumption 5. In stack overflow found that root cause could be --max-http-header-size in nodejs

Action: Increased value to 16kb

Result: Didn't help

Will be so happy to hear any suggestion to debug or resolve this issue. Thanks!

Share Improve this question asked Feb 16 at 1:12 gigifork 212 bronze badges New contributor gigifork is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

During the local load test, the load docker container returned no response at some peaks. Memory is ok and looks like it happened because of 100% CPU utilization for some period

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

node.js - 502 errors in ECS ALB - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)