最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - confused by silent truncation in polars type casting - Stack Overflow

programmeradmin4浏览0评论

I encountered some confusing behavior with polars type-casting (silently truncating floats to ints without raising an error, even when explicitly specifying strict=True), so I headed over to the documentation page on casting and now I'm even more confused.

The text at the top of the page says:

The function cast includes a parameter strict that determines how Polars behaves when it encounters a value that cannot be converted from the source data type to the target data type. The default behaviour is strict=True, which means that Polars will thrown an error to notify the user of the failed conversion while also providing details on the values that couldn't be cast.

However, the code example immediately below (section title "Basic example") shows a df with a floats column taking values including 5.8 being truncated to int 5 during casting with the code pl.col("floats").cast(pl.Int32).alias("floats_as_integers"), i.e. without strict=False.

What am I misunderstanding here? The text seems to indicate that this truncation, with strict=True as default, should "throw an error," but the code example in the documentation (and my own polars code) throws no error and silently truncates values.

I encountered some confusing behavior with polars type-casting (silently truncating floats to ints without raising an error, even when explicitly specifying strict=True), so I headed over to the documentation page on casting and now I'm even more confused.

The text at the top of the page says:

The function cast includes a parameter strict that determines how Polars behaves when it encounters a value that cannot be converted from the source data type to the target data type. The default behaviour is strict=True, which means that Polars will thrown an error to notify the user of the failed conversion while also providing details on the values that couldn't be cast.

However, the code example immediately below (section title "Basic example") shows a df with a floats column taking values including 5.8 being truncated to int 5 during casting with the code pl.col("floats").cast(pl.Int32).alias("floats_as_integers"), i.e. without strict=False.

What am I misunderstanding here? The text seems to indicate that this truncation, with strict=True as default, should "throw an error," but the code example in the documentation (and my own polars code) throws no error and silently truncates values.

Share Improve this question asked yesterday Max PowerMax Power 8,99615 gold badges60 silver badges106 bronze badges 5
  • The context of the discussion about strict isn't trunctation it's whether the number will fit into that data type. The example it talks about is casting a 32 bit signed integer into an 8 bit signed integer, i.e. the issue is overflow or underflow – juanpa.arrivillaga Commented yesterday
  • thanks juanpa. I agree overflow is part of the discussion at that docs page, but I think the issue I highlighted with the floats_as_integers column is in fact about truncation? – Max Power Commented yesterday
  • interesting answer to this question on the polars github, casts with truncation are explicitly/intentionally allowed with strict=True, only things like overflows or negatives to unsigned types are not: github/pola-rs/polars/issues/21326#issuecomment-2666478570 – Max Power Commented yesterday
  • I'm saying that nothing in the discussion about strict implies that it has anything to do with truncation. The text does not indicate that truncation, with strict=True, should throw an error. – juanpa.arrivillaga Commented yesterday
  • I guess this is more obvious if you are familiar with the conventions from other parts of the python data ecosystem, i.e. numpy/pandas, where yes, "casting" from float to int implies truncation, that is the expected, defined behavior. – juanpa.arrivillaga Commented yesterday
Add a comment  | 

2 Answers 2

Reset to default 4

It is accepted in Python (and more generally) that casting a float to an int will truncate the float and not raise an exception.

E.g. in Python:

>>> int(5.8)
5

Similarly, in Polars, casting a float to an int can be converted from the source data type to the target data type.

For anyone else looking, this answer provides further detail / examples.

To illustrate, you need to try for example downcasting an int64 containing a larger value than can be represented by the smaller.

Starting with:

import polars as pl

df = pl.DataFrame(
    {
        "integers": [1, 2, 2147483647 + 1],
        "big_integers": [10000002, 2, 30000003],
        "floats": [4.0, 5.8, -6.3],
    }
)

print(df)

Giving:

shape: (3, 3)
┌────────────┬──────────────┬────────┐
│ integers   ┆ big_integers ┆ floats │
│ ---        ┆ ---          ┆ ---    │
│ i64        ┆ i64          ┆ f64    │
╞════════════╪══════════════╪════════╡
│ 1          ┆ 10000002     ┆ 4.0    │
│ 2          ┆ 2            ┆ 5.8    │
│ 2147483648 ┆ 30000003     ┆ -6.3   │
└────────────┴──────────────┴────────┘

With a cast() where strict=True:

result = df.select(
    pl.col("integers").cast(pl.Int32, strict=True).alias("integers2")
)
print(result)

Resulting in:

polars.exceptions.InvalidOperationError: conversion from `i64` to `i32` failed in column 'integers' for 1 out of 3 values: [2147483648]

vs one where strict=False:

result = df.select(
    pl.col("integers").cast(pl.Int32, strict=False).alias("integers2")
)
print(result)

Resulting in:

shape: (3, 1)
┌───────────┐
│ integers2 │
│ ---       │
│ i32       │
╞═══════════╡
│ 1         │
│ 2         │
│ null      │
└───────────┘
发布评论

评论列表(0)

  1. 暂无评论