docker 上的 Redis 实例每隔几个月随机从 ioredis 断开一次，如何调试？

我正在使用节点库

ioredis

。并像这样配置 docker-compose 上的 redis：

  redis:
    image: "redis:latest"
    volumes:
      - redis_data:/data

这是最简单的配置，所以我希望这里没有任何问题。

我的连接也是最简单的

import Redis from "ioredis";

export const redis = new Redis(process.env.REDIS_URL ?? '');

当我打字

docker-compose up

我可以看到日志

redis_1  | 1:C 09 Jan 2023 06:00:49.251 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1  | 1:C 09 Jan 2023 06:00:49.252 # Redis version=7.0.10, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1  | 1:C 09 Jan 2023 06:00:49.252 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis_1  | 1:M 09 Jan 2023 06:00:49.254 * monotonic clock: POSIX clock_gettime
redis_1  | 1:M 09 Jan 2023 06:00:49.258 * Running mode=standalone, port=6379.
redis_1  | 1:M 09 Jan 2023 06:00:49.258 # Server initialized
redis_1  | 1:M 09 Jan 2023 06:00:49.259 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can can also cause failures without low memory condition, see . To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis_1  | 1:M 09 Jan 2023 06:00:49.260 * Loading RDB produced by version 7.0.10
redis_1  | 1:M 09 Jan 2023 06:00:49.261 * RDB age 120617 seconds
redis_1  | 1:M 09 Jan 2023 06:00:49.261 * RDB memory usage when created 274.70 Mb
redis_1  | 1:M 09 Jan 2023 06:00:51.257 * Done loading RDB, keys loaded: 1201, keys expired: 0.
redis_1  | 1:M 09 Jan 2023 06:00:51.258 * DB loaded from disk: 1.998 seconds
redis_1  | 1:M 09 Jan 2023 06:00:51.259 * Ready to accept connections

然后我看到很多天在重复

redis_1  | 1:M 09 May 2023 15:49:24.506 * 1 changes in 3600 seconds. Saving...
redis_1  | 1:M 09 May 2023 15:49:24.517 * Background saving started by pid 207
redis_1  | 207:C 09 May 2023 15:49:29.023 * DB saved on disk
redis_1  | 207:C 09 May 2023 15:49:29.025 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
redis_1  | 1:M 09 May 2023 15:49:29.094 * Background saving terminated with success
redis_1  | 1:M 09 May 2023 16:49:30.043 * 1 changes in 3600 seconds. Saving...
redis_1  | 1:M 09 May 2023 16:49:30.061 * Background saving started by pid 208
redis_1  | 208:C 09 May 2023 16:49:31.606 * DB saved on disk
redis_1  | 208:C 09 May 2023 16:49:31.608 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB
redis_1  | 1:M 09 May 2023 16:49:31.666 * Background saving terminated with success

app运行正常，突然在app里看到日志了

app_1    | [ioredis] Unhandled error event: Error: connect ECONNREFUSED 172.18.0.11:6379
app_1    |     at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
app_1    | [ioredis] Unhandled error event: Error: connect ECONNREFUSED 172.18.0.11:6379
app_1    |     at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
app_1    | [ioredis] Unhandled error event: Error: connect ECONNREFUSED 172.18.0.11:6379
app_1    |     at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
app_1    | finished in 1875996ms
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | /opt/app/node_modules/ioredis/built/redis/event_handler.js:182
app_1    |                     self.flushQueue(new errors_1.MaxRetriesPerRequestError(maxRetriesPerRequest));
app_1    |                                     ^
app_1    | 
app_1    | MaxRetriesPerRequestError: Reached the max retries per request limit (which is 20). Refer to "maxRetriesPerRequest" option for details.
app_1    |     at Socket.<anonymous> (/opt/app/node_modules/ioredis/built/redis/event_handler.js:182:37)
app_1    |     at Object.onceWrapper (node:events:628:26)
app_1    |     at Socket.emit (node:events:513:28)
app_1    |     at TCP.<anonymous> (node:net:322:12)

但是 2 小时后，redis 产生了新的日志，显示 redis 可以工作

redis_1  | 1:M 09 May 2023 18:38:33.833 * 1 changes in 3600 seconds. Saving...
redis_1  | 1:M 09 May 2023 18:38:33.842 * Background saving started by pid 209
redis_1  | 209:C 09 May 2023 18:38:35.505 * DB saved on disk
redis_1  | 209:C 09 May 2023 18:38:35.506 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB
redis_1  | 1:M 09 May 2023 18:38:35.553 * Background saving terminated with success
redis_1  | 1:M 09 May 2023 19:38:36.096 * 1 changes in 3600 seconds. Saving...
redis_1  | 1:M 09 May 2023 19:38:36.108 * Background saving started by pid 210
redis_1  | 210:C 09 May 2023 19:38:37.452 * DB saved on disk
redis_1  | 210:C 09 May 2023 19:38:37.454 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB
redis_1  | 1:M 09 May 2023 19:38:37.512 * Background saving terminated with success
redis_1  | 1:M 10 May 2023 09:19:02.490 * 1 changes in 3600 seconds. Saving...
redis_1  | 1:M 10 May 2023 09:19:02.538 * Background saving started by pid 211
redis_1  | 211:C 10 May 2023 09:19:06.152 * DB saved on disk

我目前的策略是：

每隔几分钟ping一次服务器检查我是否可以连接到redis，如果不能，则登录服务器并运行

docker-compose down
docker-compose up

它总是工作得很好，但我想以更优雅的方式解决这个问题，并了解这个错误的原因。

我能够在我维护的几个独立服务上重现此行为，但很难预测何时会发生错误。

回答如下：

日志显示超时，但真正的原因是服务器内存有限，没有受到监控。当我在事件发生后登录时，内存处于正常水平，但内存不足是这些问题的直接原因。

长期解决方案：

限制可用于redis容器的内存并选择正确的驱逐策略
设置内存监控以防止这些情况

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

docker 上的 Redis 实例每隔几个月随机从 ioredis 断开一次，如何调试？

docker 上的 Redis 实例每隔几个月随机从 ioredis 断开一次，如何调试？

与本文相关的文章

评论列表(0)