postgresql - WARNING: write on backend 1 failed with error :"Connection reset by peer”

There is docker swarm cluster with 5 nodes (3 master, 2 workers) - on-premises, debian-12. The application containers (frontend+backend) are running on master nodes. There is pgpool cluster (1 pgpool + 2 psql container). Pgpool container is running on master node. Workers nodes are used for keeping PosgreSQL containers. The application is working ok but sometimes the backend throwing the error:

[08:36:53 ERR] An error occurred using the connection to database 'sampler_db' on server 'tcp://pgpool:5432'.
[08:36:53 ERR] An exception occurred while iterating over the results of a query for context type 'Softgent.Sampler.Infrastructure.Data.AppDbContext'.
System.InvalidOperationException: An exception has been raised that is likely due to a transient failure.
 ---> Npgsql.NpgsqlException (0x80004005): Exception while reading from stream
 ---> System.TimeoutException: Timeout during reading attempt
  ...
System.InvalidOperationException: An exception has been raised that is likely due to a transient failure.
 ---> Npgsql.NpgsqlException (0x80004005): Exception while reading from stream
 ---> System.TimeoutException: Timeout during reading attempt
 ...
   --- End of inner exception stack trace ---
   at Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal.NpgsqlExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()
[08:36:53 INF] Executed endpoint 'HTTP: GET /api/orders'

at the same time pgpool has a problem

2025-02-18 08:55:47.576: psql pid 196: WARNING:  write on backend 1 failed with error :"Connection reset by peer"
2025-02-18 08:55:47.576: psql pid 196: DETAIL:  while trying to write data from offset: 0 wlen: 5
2025-02-18 08:55:47.578: psql pid 190: WARNING:  write on backend 1 failed with error :"Connection reset by peer"
2025-02-18 08:55:47.578: psql pid 190: DETAIL:  while trying to write data from offset: 0 wlen: 5
2025-02-18 08:55:47.581: psql pid 196: LOG:  received degenerate backend request for node_id: 1 from pid [196]
2025-02-18 08:55:47.582: psql pid 190: LOG:  received degenerate backend request for node_id: 1 from pid [190]
2025-02-18 08:55:47.582: psql pid 190: LOG:  signal_user1_to_parent_with_reason(0)
2025-02-18 08:55:47.582: psql pid 196: LOG:  signal_user1_to_parent_with_reason(0)
2025-02-18 08:55:47.582: main pid 1: LOG:  Pgpool-II parent process received SIGUSR1
2025-02-18 08:55:47.582: psql pid 196: LOG:  unable to flush data to backend
2025-02-18 08:55:47.582: psql pid 196: DETAIL:  do not failover because I am the main process
2025-02-18 08:55:47.582: main pid 1: LOG:  Pgpool-II parent process has received failover request
2025-02-18 08:55:47.583: main pid 1: LOG:  === Starting degeneration. shutdown host postgres-1(5432) ===
2025-02-18 08:55:47.583: psql pid 190: LOG:  unable to flush data to backend
2025-02-18 08:55:47.583: psql pid 190: DETAIL:  do not failover because I am the main process
2025-02-18 08:55:47.585: main pid 1: LOG:  Do not restart children because we are switching over node id 1 host: postgres-1 port: 5432 and we are in streaming replication mode
2025-02-18 08:55:47.585: main pid 1: LOG:  child pid 167 needs to restart because pool 0 uses backend 1
2025-02-18 08:55:47.585: main pid 1: LOG:  child pid 180 needs to restart because pool 0 uses backend 1
2025-02-18 08:55:47.586: main pid 1: LOG:  child pid 190 needs to restart because pool 0 uses backend 1
2025-02-18 08:55:47.586: main pid 1: LOG:  child pid 196 needs to restart because pool 0 uses backend 1
2025-02-18 08:55:47.586: main pid 1: LOG:  execute command: echo ">>> Failover - that will initialize new primary node search!"
>>> Failover - that will initialize new primary node search!
2025-02-18 08:55:47.590: main pid 1: LOG:  find_primary_node_repeatedly: waiting for finding a primary node
2025-02-18 08:55:47.601: main pid 1: LOG:  find_primary_node: standby node is 0
2025-02-18 08:55:47.602: main pid 1: LOG:  Pgpool-II parent process received SIGUSR1
2025-02-18 08:55:47.602: main pid 1: LOG:  reaper handler
2025-02-18 08:55:47.602: main pid 1: LOG:  reaper handler
2025-02-18 08:55:48.614: main pid 1: LOG:  find_primary_node: standby node is 0
... (the same logs: reaper handler and find_primary_node)
2025-02-18 08:56:55.337: main pid 1: LOG:  exit handler called (signal: 15)
2025-02-18 08:56:55.337: main pid 1: LOG:  shutting down by signal 15
2025-02-18 08:56:55.337: main pid 1: LOG:  terminating all child processes

and the pgpool container restarts after this issue.

PostgreSQL containers didn't have restarts and the logs are looking good (there is no suspicious logs).

Docker postgreSQL stack:

services:
  postgres-0:
    image: docker.io/bitnami/postgresql-repmgr:16-debian-12
    environment:
      - 'POSTGRESQL_POSTGRES_PASSWORD=${POSTGRES_PASSWORD}'
      - 'POSTGRESQL_USERNAME=${PGPOOL_USERNAME}'
      - 'POSTGRESQL_PASSWORD=${PGPOOL_PASSWORD}'
      - 'POSTGRESQL_DATABASE=${POSTGRES_DATABASE}'
      - 'REPMGR_USERNAME=${REPMGR_USERNAME}'
      - 'REPMGR_PASSWORD=${REPMGR_PASSWORD}'
      - 'REPMGR_PRIMARY_HOST=${PRIMARY_NAME}'
      - 'REPMGR_PARTNER_NODES=${SECONDARY_NAME},${PRIMARY_NAME}'
      - 'REPMGR_NODE_NAME=${PRIMARY_NAME}'
      - 'REPMGR_NODE_NETWORK_NAME=${PRIMARY_NAME}'
      - 'REPMGR_NODE_ID=1'
      - 'POSTGRESQL_LOG_CONNECTIONS=on'
      - 'POSTGRESQL_LOG_DISCONNECTIONS=on'
      - 'BITNAMI_DEBUG=true'

    volumes:
      - /data/postgres:/bitnami/postgresql
    ports:
      - "5432"
    networks:
      traefik_public:
        aliases:
          - postgres-server-0
    deploy:
      placement:
        constraints:
          - node.labels.type == db-master
 
  postgres-1:
    image: docker.io/bitnami/postgresql-repmgr:16-debian-12
    environment:
      - 'POSTGRESQL_POSTGRES_PASSWORD=${POSTGRES_PASSWORD}'
      - 'POSTGRESQL_USERNAME=${PGPOOL_USERNAME}'
      - 'POSTGRESQL_PASSWORD=${PGPOOL_PASSWORD}'
      - 'POSTGRESQL_DATABASE=${POSTGRES_DATABASE}'
      - 'REPMGR_USERNAME=${REPMGR_USERNAME}'
      - 'REPMGR_PASSWORD=${REPMGR_PASSWORD}'
      - 'REPMGR_PRIMARY_HOST=${PRIMARY_NAME}'
      - 'REPMGR_PARTNER_NODES=${PRIMARY_NAME},${SECONDARY_NAME}'
      - 'REPMGR_NODE_NAME=${SECONDARY_NAME}'
      - 'REPMGR_NODE_NETWORK_NAME=${SECONDARY_NAME}'
      - 'REPMGR_NODE_ID=2'
      - 'POSTGRESQL_LOG_DISCONNECTIONS=on'
      - 'POSTGRESQL_LOG_CONNECTIONS=on'
      - 'BITNAMI_DEBUG=true'
    volumes:
      - /data/postgres:/bitnami/postgresql
    ports:
      - "5432"
    networks:
      traefik_public:
        aliases:
          - postgres-server-1
    deploy:
      placement:
        constraints:
        - node.labels.type == db-slave

  pgpool:
    image: docker.io/bitnami/pgpool:4.5.4
    labels:
      logging.promtail: "true"
    ports:
      - "5432:5432"
    configs:
      - source: pgpool-config
        target: /var/pgpool-custom.conf
    environment:
      - 'PGPOOL_USER_CONF_FILE=/var/pgpool-custom.conf'
      - 'PGPOOL_BACKEND_NODES=0:${PRIMARY_NAME}:5432,1:${SECONDARY_NAME}:5432'
      - 'PGPOOL_BACKEND_APPLICATION_NAMES=postgres-0,postgres-1'
      - 'PGPOOL_SR_CHECK_USER=${REPMGR_USERNAME}'
      - 'PGPOOL_SR_CHECK_PASSWORD=${REPMGR_PASSWORD}'
      - 'PGPOOL_ENABLE_LDAP=no'
      - 'PGPOOL_POSTGRES_USERNAME=${POSTGRES_USERNAME}'
      - 'PGPOOL_POSTGRES_PASSWORD=${POSTGRES_PASSWORD}'
      - 'PGPOOL_ADMIN_USERNAME=${PG_POOL_ADMIN_USERNAME}'
      - 'PGPOOL_ADMIN_PASSWORD=${PG_POOL_ADMIN_PASSWORD}'
      - 'PGPOOL_ENABLE_LOAD_BALANCING=yes'
      - 'PGPOOL_POSTGRES_CUSTOM_USERS=${PGPOOL_USERNAME}'
      - 'PGPOOL_POSTGRES_CUSTOM_PASSWORDS=${PGPOOL_PASSWORD}'
      - 'PGPOOL_AUTO_FAILBACK=yes'
    healthcheck:
      test: ["CMD", "/opt/bitnami/scripts/pgpool/healthcheck.sh"]
      interval: 10s
      timeout: 5s
      retries: 5
    deploy:
      placement:
        constraints:
          - node.role == manager
      replicas: 1
      labels:
        - "logging.promtail=true"
    networks:
      traefik_public:
        aliases:
          - pgpool-server

networks:
  traefik_public:
    external: true

configs:
  pgpool-config:
    file: ./configs/pgpool/pgpool-custom.conf

File ./configs/pgpool/pgpool-custom.conf

failover_on_backend_error='on'

Can someone help me debugging this issue?

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

postgresql - WARNING: write on backend 1 failed with error :"Connection reset by peer” - Stack Overflow

与本文相关的文章

评论列表(0)