最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Is it possible to get predicate pushdown for where ... in ... queries in duckdb? - Stack Overflow

programmeradmin1浏览0评论

I have a question regarding predicate pushdown in duckdb. I have some data where I want to do a select-query based on a list of strings. When I was working on this I realized that it seems that for WHERE x IN (a,b,...) queries the predicate is not being pushed down into the table scan. This impacts performance quite a lot in my case since the tables I am trying to select from contains in the order of 100s of millions of rows.

Below is a toy example in python to illustrate the problem. Note that when using a WHERE x='y' clause the filtering is pushed down into the table scan, but when using WHERE x in ('y'), this is not the case.

Is there some way that I can rewrite this to ensure my filters get applied in the scan operation?

import duckdb

with duckdb.connect("example.db") as connection:
    connection.execute(
        "CREATE TABLE IF NOT EXISTS data AS SELECT * FROM 'hf://datasets/datasets-examples/doc-formats-csv-1/data.csv';"
    )

    print("----------------")
    print("DATA")
    print("----------------")
    print(connection.execute("SELECT * FROM data").fetchdf())
    print()

    print("----------------")
    print("SELECT USING IN")
    print("----------------")
    result = connection.execute("""
        EXPLAIN ANALYZE
            SELECT * FROM DATA
            WHERE kind IN ('dog', 'cat')
    """).fetchone()

    print(result[1])
    print()

    print("----------------")
    print("SELECT USING =")
    print("----------------")
    result = connection.execute("""
        EXPLAIN ANALYZE
            SELECT * FROM DATA
            WHERE kind = 'dog'
    """).fetchone()

    print(result[1])
    print()

Results:

----------------
DATA
----------------
      kind  sound
0      dog   woof
1      cat   meow
2  pokemon   pika
3    human  hello

----------------
SELECT USING IN
----------------
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││    Query Profiling Information    ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
         EXPLAIN ANALYZE             SELECT * FROM DATA             WHERE kind IN ('dog', 'cat')     
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││         HTTPFS HTTP Stats         ││
││                                   ││
││            in: 0 bytes            ││
││            out: 0 bytes           ││
││              #HEAD: 0             ││
││              #GET: 0              ││
││              #PUT: 0              ││
││              #POST: 0             ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││              Total Time: 0.0002s             ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│    ────────────────────   │
│ ((kind = 'dog') OR (kind =│
│           'cat'))         │
│                           │
│           2 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         TABLE_SCAN        │
│    ────────────────────   │
│            data           │
│                           │
│        Projections:       │
│            kind           │
│           sound           │
│                           │
│           4 Rows          │
│          (0.00s)          │
└───────────────────────────┘


----------------
SELECT USING =
----------------
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││    Query Profiling Information    ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
         EXPLAIN ANALYZE             SELECT * FROM DATA             WHERE kind = 'dog'     
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││         HTTPFS HTTP Stats         ││
││                                   ││
││            in: 0 bytes            ││
││            out: 0 bytes           ││
││              #HEAD: 0             ││
││              #GET: 0              ││
││              #PUT: 0              ││
││              #POST: 0             ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││              Total Time: 0.0002s             ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 Rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         TABLE_SCAN        │
│    ────────────────────   │
│            data           │
│                           │
│        Projections:       │
│            kind           │
│           sound           │
│                           │
│          Filters:         │
│ kind='dog' AND kind IS NOT│
│            NULL           │
│                           │
│           1 Rows          │
│          (0.00s)          │
└───────────────────────────┘
发布评论

评论列表(0)

  1. 暂无评论