最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

node.js - 502 errors in ECS ALB - Stack Overflow

programmeradmin2浏览0评论

I have nodejs aplication with nestjs. There a different endpoints, some of them respond quickly, and others could run for tens of seconds.

Architecture in aws: waf -> cloudfront -> alb -> ecs

From time to time logs in cloudfront is returning 502 errors. Logs of application in ecs don't have any information about such errors. ALB access logs also show no errors, but chart ELB 502s indicates them. I have also enabled vpc flow logs, but there is no useful information.

I found headers of the response look like this:

content-length: 524 content-type: text/html date: Tue, 11 Feb 2025 10:26:11 GMT server: awselb/2.0 set-cookie: AWSALB=lCYCp5ugtxpxDjRdGAr5UnvJWPVrLoFPvc43vx/GPelpQQWOkH5DPjZQs/waOiwfHarsAF7PQJOM/lHiZjLVnHBOOyT/QGAsk/+Xu5XhXs7sj4dbbdAyrwAS0u38; Expires=Tue, 18 Feb 2025 10:26:11 GMT; Path=/ set-cookie: AWSALBCORS=lCYCp5ugtxpxDjRdGAr5UnvJWPVrLoFPvc43vx/GPelpQQWOkH5DPjZQs/waOiwfHarsAF7PQJOM/lHiZjLVnHBOOyT/QGAsk/+Xu5XhXs7sj4dbbdAyrwAS0u38; Expires=Tue, 18 Feb 2025 10:26:11 GMT; Path=/; SameSite=None via: 1.1 38f46facdae93530546676e451869f4c.cloudfront (CloudFront) x-amz-cf-id: oN-OzKJyuQo_DZrQJmHjUuza52zSWDxBCiBFFpcZY0PVMwl2-JyDNg== x-amz-cf-pop: MUC50-P5 x-cache: Error from cloudfront

ALB attributes:

  • TLS version and cipher headers: Off
  • WAF fail open: Off
  • HTTP/2: On
  • Connection idle timeout: 60 seconds
  • HTTP client keepalive duration: 3600 seconds
  • Desync mitigation mode: Defensive
  • Drop invalid header fields: Off
  • X-Forwarded-For header: Append
  • Client port preservation: Off
  • Preserve host header: Off

Target group health-check:

  • Protocol: HTTP
  • Path: /api/health-check
  • Port: Traffic port
  • Healthy threshold: 2 consecutive health check successes
  • Unhealthy threshold: 3 consecutive health check failures
  • Timeout: 6 seconds
  • Interval: 15 seconds
  • Success codes: 200-299,404

Target group attributes:

  • Deregistration delay (draining interval): 45 seconds
  • Load balancing algorithm: Round robin
  • Slow start duration: 0 seconds
  • Stickiness: Off
  • Cross-zone load balancing: Inherit settings from load balancer attributes
  • DNS – Healthy state requirements
  • Minimum healthy target count: 1
  • Minimum healthy target percentage: off
  • Routing - Healthy state requirements
  • Minimum healthy target count: 1
  • Minimum healthy target percentage: off

Assumption 1: Problem with stickiness of ALB.

Action: Turned off stickiness

Result: Didn't help


Assumption 2. Spikes with CPU or Memory

Action: Didn't find any issues with memory. But found CPU spikes up to 100% of MAX CPU utilisation. Increased quantity of running tasks. MAX CPU utilisation lowered to 20-30%

Result: Didn't help


Assumption 3. Issue with health checks

Action: Checked that for all period I didn't see that target groups has been ever in unhealthy status. Increase timeout and interval and deregistration delay.

Result: Didn't help


Assumption 4. Issues with ALB configuration

Action: Enabled access logs. But didn't find any info

Result: Didn't help


Assumption 5. In stack overflow found that root cause could be --max-http-header-size in nodejs

Action: Increased value to 16kb

Result: Didn't help


Will be so happy to hear any suggestion to debug or resolve this issue. Thanks!

I have nodejs aplication with nestjs. There a different endpoints, some of them respond quickly, and others could run for tens of seconds.

Architecture in aws: waf -> cloudfront -> alb -> ecs

From time to time logs in cloudfront is returning 502 errors. Logs of application in ecs don't have any information about such errors. ALB access logs also show no errors, but chart ELB 502s indicates them. I have also enabled vpc flow logs, but there is no useful information.

I found headers of the response look like this:

content-length: 524 content-type: text/html date: Tue, 11 Feb 2025 10:26:11 GMT server: awselb/2.0 set-cookie: AWSALB=lCYCp5ugtxpxDjRdGAr5UnvJWPVrLoFPvc43vx/GPelpQQWOkH5DPjZQs/waOiwfHarsAF7PQJOM/lHiZjLVnHBOOyT/QGAsk/+Xu5XhXs7sj4dbbdAyrwAS0u38; Expires=Tue, 18 Feb 2025 10:26:11 GMT; Path=/ set-cookie: AWSALBCORS=lCYCp5ugtxpxDjRdGAr5UnvJWPVrLoFPvc43vx/GPelpQQWOkH5DPjZQs/waOiwfHarsAF7PQJOM/lHiZjLVnHBOOyT/QGAsk/+Xu5XhXs7sj4dbbdAyrwAS0u38; Expires=Tue, 18 Feb 2025 10:26:11 GMT; Path=/; SameSite=None via: 1.1 38f46facdae93530546676e451869f4c.cloudfront (CloudFront) x-amz-cf-id: oN-OzKJyuQo_DZrQJmHjUuza52zSWDxBCiBFFpcZY0PVMwl2-JyDNg== x-amz-cf-pop: MUC50-P5 x-cache: Error from cloudfront

ALB attributes:

  • TLS version and cipher headers: Off
  • WAF fail open: Off
  • HTTP/2: On
  • Connection idle timeout: 60 seconds
  • HTTP client keepalive duration: 3600 seconds
  • Desync mitigation mode: Defensive
  • Drop invalid header fields: Off
  • X-Forwarded-For header: Append
  • Client port preservation: Off
  • Preserve host header: Off

Target group health-check:

  • Protocol: HTTP
  • Path: /api/health-check
  • Port: Traffic port
  • Healthy threshold: 2 consecutive health check successes
  • Unhealthy threshold: 3 consecutive health check failures
  • Timeout: 6 seconds
  • Interval: 15 seconds
  • Success codes: 200-299,404

Target group attributes:

  • Deregistration delay (draining interval): 45 seconds
  • Load balancing algorithm: Round robin
  • Slow start duration: 0 seconds
  • Stickiness: Off
  • Cross-zone load balancing: Inherit settings from load balancer attributes
  • DNS – Healthy state requirements
  • Minimum healthy target count: 1
  • Minimum healthy target percentage: off
  • Routing - Healthy state requirements
  • Minimum healthy target count: 1
  • Minimum healthy target percentage: off

Assumption 1: Problem with stickiness of ALB.

Action: Turned off stickiness

Result: Didn't help


Assumption 2. Spikes with CPU or Memory

Action: Didn't find any issues with memory. But found CPU spikes up to 100% of MAX CPU utilisation. Increased quantity of running tasks. MAX CPU utilisation lowered to 20-30%

Result: Didn't help


Assumption 3. Issue with health checks

Action: Checked that for all period I didn't see that target groups has been ever in unhealthy status. Increase timeout and interval and deregistration delay.

Result: Didn't help


Assumption 4. Issues with ALB configuration

Action: Enabled access logs. But didn't find any info

Result: Didn't help


Assumption 5. In stack overflow found that root cause could be --max-http-header-size in nodejs

Action: Increased value to 16kb

Result: Didn't help


Will be so happy to hear any suggestion to debug or resolve this issue. Thanks!

Share Improve this question asked Feb 16 at 1:12 gigiforkgigifork 212 bronze badges New contributor gigifork is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
Add a comment  | 

1 Answer 1

Reset to default 0

During the local load test, the load docker container returned no response at some peaks. Memory is ok and looks like it happened because of 100% CPU utilization for some period

发布评论

评论列表(0)

  1. 暂无评论