postgresql - Index is not used when querying JSON array

Having a table with a metadata column of the JSON type I am failing to utilize the full power of the index.

Creating a new table

CREATE TABLE IF NOT EXISTS phrases (
   phraseId BIGINT,
   projectId BIGINT,
   metadata JSON
);

Filling the table with random data

DO $$
DECLARE
   names TEXT[] := ARRAY['j.doe', 'j.roe', 'd.hill'];
   components TEXT[] := ARRAY['Configuration', 'User Profile', 'Input', 'Output', 'Layout'];
   i INT;
   name TEXT;
   component_count INT;
   selected_components TEXT[];
   metadata JSON;
BEGIN
   FOR i IN 0..200000 LOOP
      name := names[FLOOR(RANDOM() * 3) + 1];
      component_count := FLOOR(RANDOM() * 3) + 1;
      selected_components := ARRAY(SELECT DISTINCT components[FLOOR(RANDOM() * 5) + 1] FROM generate_series(1, component_count));
      metadata := jsonb_build_object('author', name, 'component_list', selected_components);
        
      INSERT INTO phrases (phraseid, projectid, metadata) VALUES (i, 10101, metadata);
   END LOOP;
END $$;

The metadata entry looks like this:

{
   "author" : "j.doe",
   "component_list" : ["Configuration", "User Profile"]
}

Creating an index

CREATE INDEX
   index_project_metadata
ON phrases (
   projectId, 
   (metadata->>'author'),
   (metadata::jsonb->'component_list')
);

Composing the query (searching for rows for the given project, user, and one or more components)

EXPLAIN (ANALYZE, VERBOSE)
SELECT 
   count(*)
FROM
   phrases 
WHERE
   projectId = 10101 AND
   metadata->>'author' = 'j.doe' AND
   metadata::jsonb->'component_list' ?| ARRAY['Configuration']

Aggregate  (cost=1552.51..1552.52 rows=1 width=8) (actual time=79.621..79.621 rows=1 loops=1)
  Output: count(*)
  ->  Index Scan using index_project_metadata on public.phrases  (cost=0.42..1552.48 rows=10 width=0) (actual time=0.030..78.888 rows=23370 loops=1)
        Output: phraseid, projectid, metadata
        Index Cond: ((phrases.projectid = 10101) AND ((phrases.metadata ->> 'author'::text) = 'j.doe'::text))
        Filter: (((phrases.metadata)::jsonb -> 'component_list'::text) ?| '{Configuration}'::text[])
        Rows Removed by Filter: 43390
Planning Time: 0.091 ms
Execution Time: 79.643 ms

However, only projectId and author filtering are speeded up by the index. The component is still filtered by comparing values in each row. The real table is more complex and it takes much longer.

I can't change the table layout, but I have full control on indexing and querying.

Do I need to construct the index differently?

Having a table with a metadata column of the JSON type I am failing to utilize the full power of the index.

Creating a new table

CREATE TABLE IF NOT EXISTS phrases (
   phraseId BIGINT,
   projectId BIGINT,
   metadata JSON
);

Filling the table with random data

DO $$
DECLARE
   names TEXT[] := ARRAY['j.doe', 'j.roe', 'd.hill'];
   components TEXT[] := ARRAY['Configuration', 'User Profile', 'Input', 'Output', 'Layout'];
   i INT;
   name TEXT;
   component_count INT;
   selected_components TEXT[];
   metadata JSON;
BEGIN
   FOR i IN 0..200000 LOOP
      name := names[FLOOR(RANDOM() * 3) + 1];
      component_count := FLOOR(RANDOM() * 3) + 1;
      selected_components := ARRAY(SELECT DISTINCT components[FLOOR(RANDOM() * 5) + 1] FROM generate_series(1, component_count));
      metadata := jsonb_build_object('author', name, 'component_list', selected_components);
        
      INSERT INTO phrases (phraseid, projectid, metadata) VALUES (i, 10101, metadata);
   END LOOP;
END $$;

The metadata entry looks like this:

{
   "author" : "j.doe",
   "component_list" : ["Configuration", "User Profile"]
}

Creating an index

CREATE INDEX
   index_project_metadata
ON phrases (
   projectId, 
   (metadata->>'author'),
   (metadata::jsonb->'component_list')
);

Composing the query (searching for rows for the given project, user, and one or more components)

EXPLAIN (ANALYZE, VERBOSE)
SELECT 
   count(*)
FROM
   phrases 
WHERE
   projectId = 10101 AND
   metadata->>'author' = 'j.doe' AND
   metadata::jsonb->'component_list' ?| ARRAY['Configuration']

Aggregate  (cost=1552.51..1552.52 rows=1 width=8) (actual time=79.621..79.621 rows=1 loops=1)
  Output: count(*)
  ->  Index Scan using index_project_metadata on public.phrases  (cost=0.42..1552.48 rows=10 width=0) (actual time=0.030..78.888 rows=23370 loops=1)
        Output: phraseid, projectid, metadata
        Index Cond: ((phrases.projectid = 10101) AND ((phrases.metadata ->> 'author'::text) = 'j.doe'::text))
        Filter: (((phrases.metadata)::jsonb -> 'component_list'::text) ?| '{Configuration}'::text[])
        Rows Removed by Filter: 43390
Planning Time: 0.091 ms
Execution Time: 79.643 ms

However, only projectId and author filtering are speeded up by the index. The component is still filtered by comparing values in each row. The real table is more complex and it takes much longer.

I can't change the table layout, but I have full control on indexing and querying.

Do I need to construct the index differently?

Share Improve this question edited Mar 11 at 16:06 asked Mar 11 at 12:35 Jan Tosovsky 5445 silver badges20 bronze badges

1 Could you please add the query plan? – Frank Heikens Commented Mar 11 at 14:42

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

When using JSON you'd better use the JSONB data type. This gives you more/better indexing options, but you have to change the SELECT statement accordingly.

ALTER TABLE phrases ALTER COLUMN metadata SET DATA TYPE jsonb;
CREATE INDEX idx_phrases_metadata ON phrases USING GIN (metadata jsonb_path_ops);

SELECT
   count(*)
FROM
   phrases
WHERE
    projectId = 10101
AND metadata @> '{"author":"j.doe"}'
AND metadata @> '{"component_list":["Configuration"]}';

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

postgresql - Index is not used when querying JSON array - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)