I encountered some confusing behavior with polars type-casting (silently truncating floats to ints without raising an error, even when explicitly specifying strict=True
), so I headed over to the documentation page on casting and now I'm even more confused.
The text at the top of the page says:
The function
cast
includes a parameterstrict
that determines how Polars behaves when it encounters a value that cannot be converted from the source data type to the target data type. The default behaviour isstrict=True
, which means that Polars will thrown an error to notify the user of the failed conversion while also providing details on the values that couldn't be cast.
However, the code example immediately below (section title "Basic example") shows a df
with a floats
column taking values including 5.8
being truncated to int 5
during cast
ing with the code pl.col("floats").cast(pl.Int32).alias("floats_as_integers")
, i.e. without strict=False
.
What am I misunderstanding here? The text seems to indicate that this truncation, with strict=True
as default, should "throw an error," but the code example in the documentation (and my own polars code) throws no error and silently truncates values.
I encountered some confusing behavior with polars type-casting (silently truncating floats to ints without raising an error, even when explicitly specifying strict=True
), so I headed over to the documentation page on casting and now I'm even more confused.
The text at the top of the page says:
The function
cast
includes a parameterstrict
that determines how Polars behaves when it encounters a value that cannot be converted from the source data type to the target data type. The default behaviour isstrict=True
, which means that Polars will thrown an error to notify the user of the failed conversion while also providing details on the values that couldn't be cast.
However, the code example immediately below (section title "Basic example") shows a df
with a floats
column taking values including 5.8
being truncated to int 5
during cast
ing with the code pl.col("floats").cast(pl.Int32).alias("floats_as_integers")
, i.e. without strict=False
.
What am I misunderstanding here? The text seems to indicate that this truncation, with strict=True
as default, should "throw an error," but the code example in the documentation (and my own polars code) throws no error and silently truncates values.
2 Answers
Reset to default 4It is accepted in Python (and more generally) that casting a float to an int will truncate the float and not raise an exception.
E.g. in Python:
>>> int(5.8)
5
Similarly, in Polars, casting a float to an int can be converted from the source data type to the target data type.
For anyone else looking, this answer provides further detail / examples.
To illustrate, you need to try for example downcasting an int64 containing a larger value than can be represented by the smaller.
Starting with:
import polars as pl
df = pl.DataFrame(
{
"integers": [1, 2, 2147483647 + 1],
"big_integers": [10000002, 2, 30000003],
"floats": [4.0, 5.8, -6.3],
}
)
print(df)
Giving:
shape: (3, 3)
┌────────────┬──────────────┬────────┐
│ integers ┆ big_integers ┆ floats │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ f64 │
╞════════════╪══════════════╪════════╡
│ 1 ┆ 10000002 ┆ 4.0 │
│ 2 ┆ 2 ┆ 5.8 │
│ 2147483648 ┆ 30000003 ┆ -6.3 │
└────────────┴──────────────┴────────┘
With a cast()
where strict=True
:
result = df.select(
pl.col("integers").cast(pl.Int32, strict=True).alias("integers2")
)
print(result)
Resulting in:
polars.exceptions.InvalidOperationError: conversion from `i64` to `i32` failed in column 'integers' for 1 out of 3 values: [2147483648]
vs one where strict=False
:
result = df.select(
pl.col("integers").cast(pl.Int32, strict=False).alias("integers2")
)
print(result)
Resulting in:
shape: (3, 1)
┌───────────┐
│ integers2 │
│ --- │
│ i32 │
╞═══════════╡
│ 1 │
│ 2 │
│ null │
└───────────┘
floats_as_integers
column is in fact about truncation? – Max Power Commented yesterdaystrict=True
, only things like overflows or negatives to unsigned types are not: github/pola-rs/polars/issues/21326#issuecomment-2666478570 – Max Power Commented yesterdaystrict
implies that it has anything to do with truncation. The text does not indicate that truncation, withstrict=True
, should throw an error. – juanpa.arrivillaga Commented yesterday