最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Keep rows where a field of a list[struct] column contains a message - Stack Overflow

programmeradmin0浏览0评论

Say I have the following data:

import duckdb
rel = duckdb.sql("""
    FROM VALUES
        ([{'a': 'foo', 'b': 'bta'}]),
        ([]),
        ([{'a': 'jun', 'b': 'jul'}, {'a':'nov', 'b': 'obt'}])
        df(my_col)
    SELECT *
""")

which looks like this:

┌──────────────────────────────────────────────┐
│                    my_col                    │
│        struct(a varchar, b varchar)[]        │
├──────────────────────────────────────────────┤
│ [{'a': foo, 'b': bta}]                       │
│ []                                           │
│ [{'a': jun, 'b': jul}, {'a': nov, 'b': obt}] │
└──────────────────────────────────────────────┘

I would like to keep all rows where for any of the items in one of the elements of 'my_col', field 'a' contains the substring 'bt'

So, expected output:

┌──────────────────────────────────────────────┐
│                    my_col                    │
│        struct(a varchar, b varchar)[]        │
├──────────────────────────────────────────────┤
│ [{'a': foo, 'b': bta}]                       │
│ [{'a': jun, 'b': jul}, {'a': nov, 'b': obt}] │
└──────────────────────────────────────────────┘

How can I write a SQL query to do that?

Say I have the following data:

import duckdb
rel = duckdb.sql("""
    FROM VALUES
        ([{'a': 'foo', 'b': 'bta'}]),
        ([]),
        ([{'a': 'jun', 'b': 'jul'}, {'a':'nov', 'b': 'obt'}])
        df(my_col)
    SELECT *
""")

which looks like this:

┌──────────────────────────────────────────────┐
│                    my_col                    │
│        struct(a varchar, b varchar)[]        │
├──────────────────────────────────────────────┤
│ [{'a': foo, 'b': bta}]                       │
│ []                                           │
│ [{'a': jun, 'b': jul}, {'a': nov, 'b': obt}] │
└──────────────────────────────────────────────┘

I would like to keep all rows where for any of the items in one of the elements of 'my_col', field 'a' contains the substring 'bt'

So, expected output:

┌──────────────────────────────────────────────┐
│                    my_col                    │
│        struct(a varchar, b varchar)[]        │
├──────────────────────────────────────────────┤
│ [{'a': foo, 'b': bta}]                       │
│ [{'a': jun, 'b': jul}, {'a': nov, 'b': obt}] │
└──────────────────────────────────────────────┘

How can I write a SQL query to do that?

Share Improve this question edited Mar 3 at 14:15 ignoring_gravity asked Mar 3 at 14:03 ignoring_gravityignoring_gravity 10.7k7 gold badges44 silver badges88 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

Maybe list_sum() the bools or list_bool_or()?

  • https://duckdb./docs/stable/sql/functions/list.html#list_-rewrite-functions
duckdb.sql("""
FROM VALUES
    ([{'a': 'foo', 'b': 'bta'}]),
    ([]),
    ([{'a': 'jun', 'b': 'jul'}, {'a':'nov', 'b': 'obt'}])
    df(my_col)
SELECT *
WHERE list_bool_or(['bt' in s.b for s in my_col])
""")
┌──────────────────────────────────────────────┐
│                    my_col                    │
│        struct(a varchar, b varchar)[]        │
├──────────────────────────────────────────────┤
│ [{'a': foo, 'b': bta}]                       │
│ [{'a': jun, 'b': jul}, {'a': nov, 'b': obt}] │
└──────────────────────────────────────────────┘

The list comprehension is the same as list_apply(my_col, s -> 'bt' in s.b)

发布评论

评论列表(0)

  1. 暂无评论