最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Knex with PostgreSQL select query extremely performance degradation on multiple parallel requests - Stack Overflow

programmeradmin3浏览0评论

In brief

I'm developing a game(of dream) and my backend stack is Node.js and PostgreSQL(9.6) with Knex. I hold all players data here and I need to request it frequently. One of the requests need to make 10 simple selects that would pull the data and this is where the problem starts: these queries are quite fast(~1ms), if server serves only 1 request at the same time. But if server server many requests in parallel(100-400), queries execution time degradates extremely (can be up to several seconds per query)

Details

In order to be more objective, I will describe server's request goal, select queries and the results that I received.

About system

I'm running node code on Digital Ocean 4cpu/8gb droplet and Postgres on the same conf(2 different droplets, same configuration)

About request

It need to do some gameplay actions for which he selects data for 2 players from DB

DDL

Players' data represented by 5 tables:

CREATE TABLE public.player_profile(
    id integer NOT NULL DEFAULT nextval('player_profile_id_seq'::regclass),
    public_data integer NOT NULL,
    private_data integer NOT NULL,
    current_active_deck_num smallint NOT NULL DEFAULT '0'::smallint,
    created_at bigint NOT NULL DEFAULT '0'::bigint,
    CONSTRAINT player_profile_pkey PRIMARY KEY (id),
    CONSTRAINT player_profile_private_data_foreign FOREIGN KEY (private_data)
        REFERENCES public.profile_private_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT player_profile_public_data_foreign FOREIGN KEY (public_data)
        REFERENCES public.profile_public_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_character_data(
    id integer NOT NULL DEFAULT nextval('player_character_data_id_seq'::regclass),
    owner_player integer NOT NULL,
    character_id integer NOT NULL,
    experience_counter integer NOT NULL,
    level_counter integer NOT NULL,
    character_name character varying(255) COLLATE pg_catalog."default" NOT NULL,
    created_at bigint NOT NULL DEFAULT '0'::bigint,
    CONSTRAINT player_character_data_pkey PRIMARY KEY (id),
    CONSTRAINT player_character_data_owner_player_foreign FOREIGN KEY (owner_player)
        REFERENCES public.player_profile (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_cards(
    id integer NOT NULL DEFAULT nextval('player_cards_id_seq'::regclass),
    card_id integer NOT NULL,
    owner_player integer NOT NULL,
    card_level integer NOT NULL,
    first_deck boolean NOT NULL,
    consumables integer NOT NULL,
    second_deck boolean NOT NULL DEFAULT false,
    third_deck boolean NOT NULL DEFAULT false,
    quality character varying(10) COLLATE pg_catalog."default" NOT NULL DEFAULT 'none'::character varying,
    CONSTRAINT player_cards_pkey PRIMARY KEY (id),
    CONSTRAINT player_cards_owner_player_foreign FOREIGN KEY (owner_player)
        REFERENCES public.player_profile (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_character_equipment(
    id integer NOT NULL DEFAULT nextval('player_character_equipment_id_seq'::regclass),
    owner_character integer NOT NULL,
    item_id integer NOT NULL,
    item_level integer NOT NULL,
    item_type character varying(20) COLLATE pg_catalog."default" NOT NULL,
    is_equipped boolean NOT NULL,
    slot_num integer,
    CONSTRAINT player_character_equipment_pkey PRIMARY KEY (id),
    CONSTRAINT player_character_equipment_owner_character_foreign FOREIGN KEY (owner_character)
        REFERENCES public.player_character_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_character_runes(
    id integer NOT NULL DEFAULT nextval('player_character_runes_id_seq'::regclass),
    owner_character integer NOT NULL,
    item_id integer NOT NULL,
    slot_num integer,
    decay_start_timestamp bigint,
    CONSTRAINT player_character_runes_pkey PRIMARY KEY (id),
    CONSTRAINT player_character_runes_owner_character_foreign FOREIGN KEY (owner_character)
        REFERENCES public.player_character_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

With indexes

knex.raw('create index "player_cards_owner_player_first_deck_index" on "player_cards"("owner_player") WHERE first_deck = TRUE');
knex.raw('create index "player_cards_owner_player_second_deck_index" on "player_cards"("owner_player") WHERE second_deck = TRUE');
knex.raw('create index "player_cards_owner_player_third_deck_index" on "player_cards"("owner_player") WHERE third_deck = TRUE');
knex.raw('create index "player_character_equipment_owner_character_is_equipped_index" on "player_character_equipment" ("owner_character") WHERE is_equipped = TRUE');
knex.raw('create index "player_character_runes_owner_character_slot_num_not_null_index" on "player_character_runes" ("owner_character") WHERE slot_num IS NOT NULL');

The code

First query

async.parallel([
    cb => tx('player_character_data')
        .select('character_id', 'id')
        .where('owner_player', playerId)
        .limit(1)
        .asCallback(cb),
    cb => tx('player_character_data')
        .select('character_id', 'id')
        .where('owner_player', enemyId)
        .limit(1)
        .asCallback(cb)
], callbackFn);

Second query

async.parallel([
    cb => tx('player_profile')
        .select('current_active_deck_num')
        .where('id', playerId)
        .asCallback(cb),
    cb => tx('player_profile')
        .select('current_active_deck_num')
        .where('id', enemyId)
        .asCallback(cb)
], callbackFn);

Third q

playerQ = { first_deck: true }
enemyQ = { first_deck: true }
MAX_CARDS_IN_DECK = 5
async.parallel([
    cb => tx('player_cards')
        .select('card_id', 'card_level')
        .where('owner_player', playerId)
        .andWhere(playerQ)
        .limit(MAX_CARDS_IN_DECK)
        .asCallback(cb),
    cb => tx('player_cards')
        .select('card_id', 'card_level')
        .where('owner_player', enemyId)
        .andWhere(enemyQ)
        .limit(MAX_CARDS_IN_DECK)
        .asCallback(cb)
], callbackFn);

Fourth q

MAX_EQUIPPED_ITEMS = 3
async.parallel([
    cb => tx('player_character_equipment')
        .select('item_id', 'item_level')
        .where('owner_character', playerCharacterUniqueId)
        .andWhere('is_equipped', true)
        .limit(MAX_EQUIPPED_ITEMS)
        .asCallback(cb),
    cb => tx('player_character_equipment')
        .select('item_id', 'item_level')
        .where('owner_character', enemyCharacterUniqueId)
        .andWhere('is_equipped', true)
        .limit(MAX_EQUIPPED_ITEMS)
        .asCallback(cb)
], callbackFn);

Fifth one

runeSlotsMax = 3
async.parallel([
    cb => tx('player_character_runes')
        .select('item_id', 'decay_start_timestamp')
        .where('owner_character', playerCharacterUniqueId)
        .whereNotNull('slot_num')
        .limit(runeSlotsMax)
        .asCallback(cb),
    cb => tx('player_character_runes')
        .select('item_id', 'decay_start_timestamp')
        .where('owner_character', enemyCharacterUniqueId)
        .whereNotNull('slot_num')
        .limit(runeSlotsMax)
        .asCallback(cb)
], callbackFn);

EXPLAIN(ANALYZE)

Only Index scans and <1ms for planning and execution times. Can publish if need (did not published to save space)

Time itself

(total is number of requests, min/max/avg/median is for response time)

  • 4 concurrent requests: { "total": 300, "avg": 1.81, "median": 2, "min": 1, "max": 6 }
  • 400 concurrent requests:
    • { "total": 300, "avg": 209.57666666666665, "median": 176, "min": 9, "max": 1683 } - first select
    • { "total": 300, "avg": 2105.9, "median": 2005, "min": 1563, "max": 4074 } - last select

I tried to put slow queries that are performed more than 100ms into logs - nothing. Also tried to increase connection pool size up to num of parallel requests - nothing too.

In brief

I'm developing a game(of dream) and my backend stack is Node.js and PostgreSQL(9.6) with Knex. I hold all players data here and I need to request it frequently. One of the requests need to make 10 simple selects that would pull the data and this is where the problem starts: these queries are quite fast(~1ms), if server serves only 1 request at the same time. But if server server many requests in parallel(100-400), queries execution time degradates extremely (can be up to several seconds per query)

Details

In order to be more objective, I will describe server's request goal, select queries and the results that I received.

About system

I'm running node code on Digital Ocean 4cpu/8gb droplet and Postgres on the same conf(2 different droplets, same configuration)

About request

It need to do some gameplay actions for which he selects data for 2 players from DB

DDL

Players' data represented by 5 tables:

CREATE TABLE public.player_profile(
    id integer NOT NULL DEFAULT nextval('player_profile_id_seq'::regclass),
    public_data integer NOT NULL,
    private_data integer NOT NULL,
    current_active_deck_num smallint NOT NULL DEFAULT '0'::smallint,
    created_at bigint NOT NULL DEFAULT '0'::bigint,
    CONSTRAINT player_profile_pkey PRIMARY KEY (id),
    CONSTRAINT player_profile_private_data_foreign FOREIGN KEY (private_data)
        REFERENCES public.profile_private_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION,
    CONSTRAINT player_profile_public_data_foreign FOREIGN KEY (public_data)
        REFERENCES public.profile_public_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_character_data(
    id integer NOT NULL DEFAULT nextval('player_character_data_id_seq'::regclass),
    owner_player integer NOT NULL,
    character_id integer NOT NULL,
    experience_counter integer NOT NULL,
    level_counter integer NOT NULL,
    character_name character varying(255) COLLATE pg_catalog."default" NOT NULL,
    created_at bigint NOT NULL DEFAULT '0'::bigint,
    CONSTRAINT player_character_data_pkey PRIMARY KEY (id),
    CONSTRAINT player_character_data_owner_player_foreign FOREIGN KEY (owner_player)
        REFERENCES public.player_profile (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_cards(
    id integer NOT NULL DEFAULT nextval('player_cards_id_seq'::regclass),
    card_id integer NOT NULL,
    owner_player integer NOT NULL,
    card_level integer NOT NULL,
    first_deck boolean NOT NULL,
    consumables integer NOT NULL,
    second_deck boolean NOT NULL DEFAULT false,
    third_deck boolean NOT NULL DEFAULT false,
    quality character varying(10) COLLATE pg_catalog."default" NOT NULL DEFAULT 'none'::character varying,
    CONSTRAINT player_cards_pkey PRIMARY KEY (id),
    CONSTRAINT player_cards_owner_player_foreign FOREIGN KEY (owner_player)
        REFERENCES public.player_profile (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_character_equipment(
    id integer NOT NULL DEFAULT nextval('player_character_equipment_id_seq'::regclass),
    owner_character integer NOT NULL,
    item_id integer NOT NULL,
    item_level integer NOT NULL,
    item_type character varying(20) COLLATE pg_catalog."default" NOT NULL,
    is_equipped boolean NOT NULL,
    slot_num integer,
    CONSTRAINT player_character_equipment_pkey PRIMARY KEY (id),
    CONSTRAINT player_character_equipment_owner_character_foreign FOREIGN KEY (owner_character)
        REFERENCES public.player_character_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

CREATE TABLE public.player_character_runes(
    id integer NOT NULL DEFAULT nextval('player_character_runes_id_seq'::regclass),
    owner_character integer NOT NULL,
    item_id integer NOT NULL,
    slot_num integer,
    decay_start_timestamp bigint,
    CONSTRAINT player_character_runes_pkey PRIMARY KEY (id),
    CONSTRAINT player_character_runes_owner_character_foreign FOREIGN KEY (owner_character)
        REFERENCES public.player_character_data (id) MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
);

With indexes

knex.raw('create index "player_cards_owner_player_first_deck_index" on "player_cards"("owner_player") WHERE first_deck = TRUE');
knex.raw('create index "player_cards_owner_player_second_deck_index" on "player_cards"("owner_player") WHERE second_deck = TRUE');
knex.raw('create index "player_cards_owner_player_third_deck_index" on "player_cards"("owner_player") WHERE third_deck = TRUE');
knex.raw('create index "player_character_equipment_owner_character_is_equipped_index" on "player_character_equipment" ("owner_character") WHERE is_equipped = TRUE');
knex.raw('create index "player_character_runes_owner_character_slot_num_not_null_index" on "player_character_runes" ("owner_character") WHERE slot_num IS NOT NULL');

The code

First query

async.parallel([
    cb => tx('player_character_data')
        .select('character_id', 'id')
        .where('owner_player', playerId)
        .limit(1)
        .asCallback(cb),
    cb => tx('player_character_data')
        .select('character_id', 'id')
        .where('owner_player', enemyId)
        .limit(1)
        .asCallback(cb)
], callbackFn);

Second query

async.parallel([
    cb => tx('player_profile')
        .select('current_active_deck_num')
        .where('id', playerId)
        .asCallback(cb),
    cb => tx('player_profile')
        .select('current_active_deck_num')
        .where('id', enemyId)
        .asCallback(cb)
], callbackFn);

Third q

playerQ = { first_deck: true }
enemyQ = { first_deck: true }
MAX_CARDS_IN_DECK = 5
async.parallel([
    cb => tx('player_cards')
        .select('card_id', 'card_level')
        .where('owner_player', playerId)
        .andWhere(playerQ)
        .limit(MAX_CARDS_IN_DECK)
        .asCallback(cb),
    cb => tx('player_cards')
        .select('card_id', 'card_level')
        .where('owner_player', enemyId)
        .andWhere(enemyQ)
        .limit(MAX_CARDS_IN_DECK)
        .asCallback(cb)
], callbackFn);

Fourth q

MAX_EQUIPPED_ITEMS = 3
async.parallel([
    cb => tx('player_character_equipment')
        .select('item_id', 'item_level')
        .where('owner_character', playerCharacterUniqueId)
        .andWhere('is_equipped', true)
        .limit(MAX_EQUIPPED_ITEMS)
        .asCallback(cb),
    cb => tx('player_character_equipment')
        .select('item_id', 'item_level')
        .where('owner_character', enemyCharacterUniqueId)
        .andWhere('is_equipped', true)
        .limit(MAX_EQUIPPED_ITEMS)
        .asCallback(cb)
], callbackFn);

Fifth one

runeSlotsMax = 3
async.parallel([
    cb => tx('player_character_runes')
        .select('item_id', 'decay_start_timestamp')
        .where('owner_character', playerCharacterUniqueId)
        .whereNotNull('slot_num')
        .limit(runeSlotsMax)
        .asCallback(cb),
    cb => tx('player_character_runes')
        .select('item_id', 'decay_start_timestamp')
        .where('owner_character', enemyCharacterUniqueId)
        .whereNotNull('slot_num')
        .limit(runeSlotsMax)
        .asCallback(cb)
], callbackFn);

EXPLAIN(ANALYZE)

Only Index scans and <1ms for planning and execution times. Can publish if need (did not published to save space)

Time itself

(total is number of requests, min/max/avg/median is for response time)

  • 4 concurrent requests: { "total": 300, "avg": 1.81, "median": 2, "min": 1, "max": 6 }
  • 400 concurrent requests:
    • { "total": 300, "avg": 209.57666666666665, "median": 176, "min": 9, "max": 1683 } - first select
    • { "total": 300, "avg": 2105.9, "median": 2005, "min": 1563, "max": 4074 } - last select

I tried to put slow queries that are performed more than 100ms into logs - nothing. Also tried to increase connection pool size up to num of parallel requests - nothing too.

Share Improve this question edited Apr 15, 2017 at 16:29 Cœur 38.8k25 gold badges205 silver badges277 bronze badges asked Feb 9, 2017 at 12:18 Andrew MedvedevAndrew Medvedev 1612 silver badges10 bronze badges 12
  • The reported times are measured @ the client(s) ? – joop Commented Feb 9, 2017 at 13:07
  • No. It is measured inside of server code as interval between a(point right before knex query) and b(point inside callbackFn) – Andrew Medvedev Commented Feb 9, 2017 at 13:16
  • I tried to put slow queries that are performed more than 100ms into logs - nothing Then: where is the rest of the time going? Context/task-switching? – joop Commented Feb 9, 2017 at 13:50
  • ... between a(point right before knex query) and b(point inside callbackFn That is what I consider the client (from my perspective the DBMS is the server). So either you have (network)latency between knex->dbms->knex, or your machine is overitted and task-swapping. Try monitoring via vmstat and/or top. (have you considerered separate hardware for knex and DBMS?) – joop Commented Feb 9, 2017 at 14:23
  • Then: where is the rest of the time going? That's the question :) – Andrew Medvedev Commented Feb 9, 2017 at 14:41
 |  Show 7 more ments

2 Answers 2

Reset to default 2

A solution was found fast, but forgot to respond here (was busy, sorry).

No magic with slow queries, but only node's event loop nature:

  • All silimar requests was made in parallel;
  • I have a code block with very slow execution time(~150-200ms);
  • If you have ~800 parallel requests 150ms code block transforms into ~10000ms event loop lag;
  • All you'll see is a visibility of slow requests, but it's just a lag of callback function, not database;

Conclusion: use pgBadger to detect slow queries and isBusy module to detect event loop lags

I can see three potential problems here:

  1. 400 concurrent requests are actually quite a lot, and your machine spec is nothing to get excited about. Maybe this has more to do with my MSSQL background, but I would imagine this is a case where you may need to beef up the hardware.
  2. The munication between the two servers should be quite quick, but may account for some of the latency you're seeing. One powerful server may be a better solution.
  3. I assume you have reasonable data volumes (400 concurrent connections should have a lot to store). Perhaps posting some of the actual generated SQL may be useful. A lot depends on the the SQL Knex es up with, and there may be available optimizations that can be used. Indexing es to mind, but one would need to see the SQL to say for sure.

Your test doesn't seem to include the network latency from a client, so this may be an additional problem you have not yet accounted for.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论