最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Convert the values in a SQL array by SELECTing from another table? - Stack Overflow

programmeradmin1浏览0评论

I have a table "Data" containing arrays of FOOs, and a separate table "Lookup" where I can find the BAR for each FOO. I want to write a SQL query which returns the Data table, but with arrays of the BARs that correspond to the FOOs in the original.

I appreicate it may be an awkward table structure from a SQL perspective, and I won't go into why it is this way. I just want to know if what I want is technically possible.

As an example, the "Data" table might contain arrays of cities, and I want arrays of countries.

Data

Name Cities
'Mark' ['Manchester', 'Paris']
'Aisha' ['London', 'Munich']

I have a table "Data" containing arrays of FOOs, and a separate table "Lookup" where I can find the BAR for each FOO. I want to write a SQL query which returns the Data table, but with arrays of the BARs that correspond to the FOOs in the original.

I appreicate it may be an awkward table structure from a SQL perspective, and I won't go into why it is this way. I just want to know if what I want is technically possible.

As an example, the "Data" table might contain arrays of cities, and I want arrays of countries.

Data

Name Cities
'Mark' ['Manchester', 'Paris']
'Aisha' ['London', 'Munich']

Lookup

Country Capital_city Other_cities
'Britain' 'London' ['Manchester', 'Bristol']
'France' 'Paris' ['Bordeaux', 'Marseilles']
'Germany' 'Berlin' ['Munich', 'Cologne']

Desired output

Name Cities
'Mark' ['Britain', 'France']
'Aisha' ['Britain', 'Germany']

What I have tried so far

I have a working query which can find the country for a given city @City:

SELECT Country
FROM Lookup
WHERE (
    Capital_City = @City
    OR @City IN (SELECT Cities FROM Lookup)
)

What I can't figure out is: how to do this for every single item in every array in the Cities column of my Data table?

Note: I'm using the SQL API from the python Polars module. It seems to generally support core SQL features.

Share Improve this question edited Mar 13 at 11:05 jqurious 22.1k5 gold badges20 silver badges39 bronze badges asked Mar 13 at 0:07 DarthVladerDarthVlader 3753 silver badges14 bronze badges 1
  • What is your database manager (RDBMS)? – tinazmu Commented Mar 13 at 3:04
Add a comment  | 

1 Answer 1

Reset to default 1

As far as I know, what you want to do is technically possible in SQL, but handling arrays in SQL can be tricky though. Since you’re using Polars’ SQL API, you can make use of Polars’ explode() function in combination with SQL joins to transform your array-based data structure.

Polars SQL supports array handling so that you can use UNNEST or explode in Python to flatten arrays before joining. So, for a SQL query, you can try this:

WITH Exploded AS (
    SELECT Name, UNNEST(Cities) AS City
    FROM Data
),
Matched AS (
    SELECT e.Name, l.Country
    FROM Exploded e
    JOIN Lookup l 
    ON e.City = l.Capital_City OR e.City = ANY(l.Other_cities)
)
SELECT Name, ARRAY_AGG(DISTINCT Country) AS Countries
FROM Matched
GROUP BY Name;

If you intend to use Polars in Python, you can use Polars’ DataFrame API instead, like this:

import polars as pl

# sample DF :)
data = pl.DataFrame({
    "Name": ["Mark", "Aisha"],
    "Cities": [["Manchester", "Paris"], ["London", "Munich"]]
})

lookup = pl.DataFrame({
    "Country": ["Britain", "France", "Germany"],
    "Capital_City": ["London", "Paris", "Berlin"],
    "Other_cities": [["Manchester", "Bristol"], ["Bordeaux", "Marseilles"], ["Munich", "Cologne"]]
})

data_exploded = data.explode("Cities")

lookup_exploded = lookup.explode("Other_cities")

# join on Capital_City and Other_cities
matched = (
    data_exploded.join(lookup, left_on="Cities", right_on="Capital_City", how="left")
    .select(["Name", "Country"])
    .fill_null("")
    .vstack(
        data_exploded.join(lookup_exploded, left_on="Cities", right_on="Other_cities", how="left")
        .select(["Name", "Country"])
        .fill_null("")
    )
    .filter(pl.col("Country") != "")
)

result = matched.group_by("Name").agg(pl.col("Country").unique())

print(result)

You should get something like:


shape: (2, 2)
┌───────┬─────────────────────────┐
│ Name  │ Country                 │
├───────┼─────────────────────────┤
│ Mark  │ ['Britain', 'France']   │
│ Aisha │ ['Britain', 'Germany']  │
└───────┴─────────────────────────┘

So use SQL if your engine supports UNNEST() for array operations and Use Polars’ .explode()method if SQL has limitations and of course ensure DISTINCT values when aggregating back into arrays.

发布评论

评论列表(0)

  1. 暂无评论