Using the documentation here (which also points to here) I would expect the following use of the Polars strptime function to produce a pl.Date value:
import polars as pl
date_format = "%Y-Q%q-%d"
df = pl.DataFrame({
"quarter_str": ["2024-Q1-01", "2023-Q3-01", "2025-Q2-01"]
})
## another approach that does not work
#date_format = "%Y-Q%q"
#df = pl.DataFrame({
# "quarter_str": ["2024-Q1", "2023-Q3", "2025-Q2"]
#})
result = df.with_columns(
pl.col("quarter_str").str.strptime(pl.Date, format=date_format, strict=False).alias("parsed_date")
)
print(result)
I'm not sure if this is my issue, a Polars issue, or an issue with the Rust library. But it seems like the parsing of the quarter is not performing as expected. Note that neither of the above approaches work (see commented out value). At first I thought it would not assume the first day of the quarter, but passing a day %d value didn't help either.
Is there a python polars approach to convert a string only containing a year/quarter into a pl.Date value?
The expected output for the string '2023-Q3', for example, would be the date July 1st, 2023. Since that date is the first day of the 3rd quarter. The expected output for the string '2023-Q3-27' would be July 27th, 2023 (the 27th day of the quarter). The expected output for the string '2023-Q3-45' would be the 45th day of the 3rd quarter of 2023 - sometime in mid-August, for example.
Using the documentation here (which also points to here) I would expect the following use of the Polars strptime function to produce a pl.Date value:
import polars as pl
date_format = "%Y-Q%q-%d"
df = pl.DataFrame({
"quarter_str": ["2024-Q1-01", "2023-Q3-01", "2025-Q2-01"]
})
## another approach that does not work
#date_format = "%Y-Q%q"
#df = pl.DataFrame({
# "quarter_str": ["2024-Q1", "2023-Q3", "2025-Q2"]
#})
result = df.with_columns(
pl.col("quarter_str").str.strptime(pl.Date, format=date_format, strict=False).alias("parsed_date")
)
print(result)
I'm not sure if this is my issue, a Polars issue, or an issue with the Rust library. But it seems like the parsing of the quarter is not performing as expected. Note that neither of the above approaches work (see commented out value). At first I thought it would not assume the first day of the quarter, but passing a day %d value didn't help either.
Is there a python polars approach to convert a string only containing a year/quarter into a pl.Date value?
The expected output for the string '2023-Q3', for example, would be the date July 1st, 2023. Since that date is the first day of the 3rd quarter. The expected output for the string '2023-Q3-27' would be July 27th, 2023 (the 27th day of the quarter). The expected output for the string '2023-Q3-45' would be the 45th day of the 3rd quarter of 2023 - sometime in mid-August, for example.
Share Improve this question edited Mar 25 at 16:19 Rodalm 5,5838 silver badges21 bronze badges asked Mar 25 at 15:10 sicsmprsicsmpr 535 bronze badges 3 |1 Answer
Reset to default 2As far as I know, there is no direct way to parse those formats using pl.Expr.str.strptime
. An alternative approach using column expressions is:
Extract the year, quarter and optionally the day since the start of the quarter from the
quarter_str
Determine the starting month of the quarter
Construct the quarter start date
If
quarter_str
contains days at the end of the string, add them to the quarter start date
import polars as pl
df = pl.DataFrame(
{
"quarter_str": ["2023-Q3-01", "2023-Q3-27", "2025-Q4", "2025-Q3-45"],
}
)
quarter = pl.col("quarter_str").str.extract("Q(\d)").cast(pl.Int8)
year = pl.col("quarter_str").str.extract("(\d{4})").cast(pl.Int16)
month = (quarter - 1) * 3 + 1
quarter_start = pl.date(year, month, 1)
has_day_of_quarter = pl.col("quarter_str").str.contains("\d{4}-Q\d-\d+")
days_since_quarter_start = (
pl.when(has_day_of_quarter)
.then(pl.col("quarter_str").str.extract("-(\d+)$").cast(pl.Int16) - 1)
.otherwise(0)
)
result = df.with_columns(
(quarter_start + pl.duration(days=days_since_quarter_start)).alias("parsed_date")
)
Output :
>>> result
shape: (4, 2)
┌─────────────┬─────────────┐
│ quarter_str ┆ parsed_date │
│ --- ┆ --- │
│ str ┆ date │
╞═════════════╪═════════════╡
│ 2023-Q3-01 ┆ 2023-07-01 │
│ 2023-Q3-27 ┆ 2023-07-27 │
│ 2025-Q4 ┆ 2025-10-01 │
│ 2025-Q3-45 ┆ 2025-08-14 │
└─────────────┴─────────────┘
%d
to represent the day of the quarter instead of the day of the month? – Rodalm Commented Mar 25 at 15:26