最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - np.clip functionality using pyarrow compute in duckdb udf - Stack Overflow

programmeradmin0浏览0评论

Iam currently working on a duckdb udf using pyarrow compute. Works great so far. Now i need to clip a value between 0 and 1, e.g minimal example:

import numpy as np
import pyarrow as pa
import pyarrowpute as pc
import duckdb

def funcArrow(x: float) -> float:
    return np.clip(x, 0, 1) # this should be something like pc.clip(x,0,1)


con = duckdb.connect("test.db")
con.create_function("funcArrow", funcArrow, type="arrow")
print(con.sql("SELECT funcArrow(x) AS f from myTable;")

I tried using pc.min, pc.max or if else loops, but always run into errors. Any suggestions?

Thank you

Iam currently working on a duckdb udf using pyarrow compute. Works great so far. Now i need to clip a value between 0 and 1, e.g minimal example:

import numpy as np
import pyarrow as pa
import pyarrowpute as pc
import duckdb

def funcArrow(x: float) -> float:
    return np.clip(x, 0, 1) # this should be something like pc.clip(x,0,1)


con = duckdb.connect("test.db")
con.create_function("funcArrow", funcArrow, type="arrow")
print(con.sql("SELECT funcArrow(x) AS f from myTable;")

I tried using pc.min, pc.max or if else loops, but always run into errors. Any suggestions?

Thank you

Share Improve this question asked Jan 30 at 14:13 user2148566user2148566 12 bronze badges 1
  • what errors are you getting? – 0x26res Commented Jan 31 at 22:54
Add a comment  | 

1 Answer 1

Reset to default 1

I tried using pc.min, pc.max but always run into errors.

Much like in SQL min and max aggregate vertically, and don't do a horizontal comparison.

In the absence of a PyArrow Compute clip function, I was able to find min_element_wise and max_element_wise, which seem to me (on first glance) analogous to SQL's least and greatest respectively.

You can use these together to clip the minimum value to 0 and the maximum value to 1.

You may decide whether you prefer a UDF or to express this in native SQL (which I assume will be more performant). I've provided an answer below that includes both options.

import duckdb
import pyarrow as pa
import pyarrowpute as pc


def funcArrow(x: float) -> float:
    return pc.min_element_wise(pc.max_element_wise(x, 0), 1)


con = duckdb.connect()
con.create_function("funcArrow", funcArrow, type="arrow")
con.sql("""
    SELECT *
        x,
        funcArrow(x) as udf,
        least(greatest(x, 0), 1) as sql,
    FROM UNNEST([-1, 0, 0.5, 1, 2]) as tbl(x)
""")
┌───────────────┬────────┬───────────────┐
│       x       │  udf   │      sql      │
│ decimal(11,1) │ double │ decimal(11,1) │
├───────────────┼────────┼───────────────┤
│          -1.0 │    0.0 │           0.0 │
│           0.0 │    0.0 │           0.0 │
│           0.5 │    0.5 │           0.5 │
│           1.0 │    1.0 │           1.0 │
│           2.0 │    1.0 │           1.0 │
└───────────────┴────────┴───────────────┘
发布评论

评论列表(0)

  1. 暂无评论