最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

postgresql - Index is not used when querying JSON array - Stack Overflow

programmeradmin2浏览0评论

Having a table with a metadata column of the JSON type I am failing to utilize the full power of the index.

  1. Creating a new table
CREATE TABLE IF NOT EXISTS phrases (
   phraseId BIGINT,
   projectId BIGINT,
   metadata JSON
);
  1. Filling the table with random data
DO $$
DECLARE
   names TEXT[] := ARRAY['j.doe', 'j.roe', 'd.hill'];
   components TEXT[] := ARRAY['Configuration', 'User Profile', 'Input', 'Output', 'Layout'];
   i INT;
   name TEXT;
   component_count INT;
   selected_components TEXT[];
   metadata JSON;
BEGIN
   FOR i IN 0..200000 LOOP
      name := names[FLOOR(RANDOM() * 3) + 1];
      component_count := FLOOR(RANDOM() * 3) + 1;
      selected_components := ARRAY(SELECT DISTINCT components[FLOOR(RANDOM() * 5) + 1] FROM generate_series(1, component_count));
      metadata := jsonb_build_object('author', name, 'component_list', selected_components);
        
      INSERT INTO phrases (phraseid, projectid, metadata) VALUES (i, 10101, metadata);
   END LOOP;
END $$;

The metadata entry looks like this:

{
   "author" : "j.doe",
   "component_list" : ["Configuration", "User Profile"]
}
  1. Creating an index
CREATE INDEX
   index_project_metadata
ON phrases (
   projectId, 
   (metadata->>'author'),
   (metadata::jsonb->'component_list')
);
  1. Composing the query (searching for rows for the given project, user, and one or more components)
EXPLAIN (ANALYZE, VERBOSE)
SELECT 
   count(*)
FROM
   phrases 
WHERE
   projectId = 10101 AND
   metadata->>'author' = 'j.doe' AND
   metadata::jsonb->'component_list' ?| ARRAY['Configuration']
Aggregate  (cost=1552.51..1552.52 rows=1 width=8) (actual time=79.621..79.621 rows=1 loops=1)
  Output: count(*)
  ->  Index Scan using index_project_metadata on public.phrases  (cost=0.42..1552.48 rows=10 width=0) (actual time=0.030..78.888 rows=23370 loops=1)
        Output: phraseid, projectid, metadata
        Index Cond: ((phrases.projectid = 10101) AND ((phrases.metadata ->> 'author'::text) = 'j.doe'::text))
        Filter: (((phrases.metadata)::jsonb -> 'component_list'::text) ?| '{Configuration}'::text[])
        Rows Removed by Filter: 43390
Planning Time: 0.091 ms
Execution Time: 79.643 ms

However, only projectId and author filtering are speeded up by the index. The component is still filtered by comparing values in each row. The real table is more complex and it takes much longer.

I can't change the table layout, but I have full control on indexing and querying.

Do I need to construct the index differently?

Having a table with a metadata column of the JSON type I am failing to utilize the full power of the index.

  1. Creating a new table
CREATE TABLE IF NOT EXISTS phrases (
   phraseId BIGINT,
   projectId BIGINT,
   metadata JSON
);
  1. Filling the table with random data
DO $$
DECLARE
   names TEXT[] := ARRAY['j.doe', 'j.roe', 'd.hill'];
   components TEXT[] := ARRAY['Configuration', 'User Profile', 'Input', 'Output', 'Layout'];
   i INT;
   name TEXT;
   component_count INT;
   selected_components TEXT[];
   metadata JSON;
BEGIN
   FOR i IN 0..200000 LOOP
      name := names[FLOOR(RANDOM() * 3) + 1];
      component_count := FLOOR(RANDOM() * 3) + 1;
      selected_components := ARRAY(SELECT DISTINCT components[FLOOR(RANDOM() * 5) + 1] FROM generate_series(1, component_count));
      metadata := jsonb_build_object('author', name, 'component_list', selected_components);
        
      INSERT INTO phrases (phraseid, projectid, metadata) VALUES (i, 10101, metadata);
   END LOOP;
END $$;

The metadata entry looks like this:

{
   "author" : "j.doe",
   "component_list" : ["Configuration", "User Profile"]
}
  1. Creating an index
CREATE INDEX
   index_project_metadata
ON phrases (
   projectId, 
   (metadata->>'author'),
   (metadata::jsonb->'component_list')
);
  1. Composing the query (searching for rows for the given project, user, and one or more components)
EXPLAIN (ANALYZE, VERBOSE)
SELECT 
   count(*)
FROM
   phrases 
WHERE
   projectId = 10101 AND
   metadata->>'author' = 'j.doe' AND
   metadata::jsonb->'component_list' ?| ARRAY['Configuration']
Aggregate  (cost=1552.51..1552.52 rows=1 width=8) (actual time=79.621..79.621 rows=1 loops=1)
  Output: count(*)
  ->  Index Scan using index_project_metadata on public.phrases  (cost=0.42..1552.48 rows=10 width=0) (actual time=0.030..78.888 rows=23370 loops=1)
        Output: phraseid, projectid, metadata
        Index Cond: ((phrases.projectid = 10101) AND ((phrases.metadata ->> 'author'::text) = 'j.doe'::text))
        Filter: (((phrases.metadata)::jsonb -> 'component_list'::text) ?| '{Configuration}'::text[])
        Rows Removed by Filter: 43390
Planning Time: 0.091 ms
Execution Time: 79.643 ms

However, only projectId and author filtering are speeded up by the index. The component is still filtered by comparing values in each row. The real table is more complex and it takes much longer.

I can't change the table layout, but I have full control on indexing and querying.

Do I need to construct the index differently?

Share Improve this question edited Mar 11 at 16:06 Jan Tosovsky asked Mar 11 at 12:35 Jan TosovskyJan Tosovsky 5445 silver badges20 bronze badges 1
  • 1 Could you please add the query plan? – Frank Heikens Commented Mar 11 at 14:42
Add a comment  | 

1 Answer 1

Reset to default 0

When using JSON you'd better use the JSONB data type. This gives you more/better indexing options, but you have to change the SELECT statement accordingly.

ALTER TABLE phrases ALTER COLUMN metadata SET DATA TYPE jsonb;
CREATE INDEX idx_phrases_metadata ON phrases USING GIN (metadata jsonb_path_ops);

SELECT
   count(*)
FROM
   phrases
WHERE
    projectId = 10101
AND metadata @> '{"author":"j.doe"}'
AND metadata @> '{"component_list":["Configuration"]}';

发布评论

评论列表(0)

  1. 暂无评论