最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - polars date quarter parsing using strptime returns null - Stack Overflow

programmeradmin5浏览0评论

Using the documentation here (which also points to here) I would expect the following use of the Polars strptime function to produce a pl.Date value:

import polars as pl

date_format = "%Y-Q%q-%d"
df = pl.DataFrame({
    "quarter_str": ["2024-Q1-01", "2023-Q3-01", "2025-Q2-01"]
})

## another approach that does not work
#date_format = "%Y-Q%q"
#df = pl.DataFrame({
#    "quarter_str": ["2024-Q1", "2023-Q3", "2025-Q2"]
#})

result = df.with_columns(
    pl.col("quarter_str").str.strptime(pl.Date, format=date_format, strict=False).alias("parsed_date")
)

print(result)

I'm not sure if this is my issue, a Polars issue, or an issue with the Rust library. But it seems like the parsing of the quarter is not performing as expected. Note that neither of the above approaches work (see commented out value). At first I thought it would not assume the first day of the quarter, but passing a day %d value didn't help either.

Is there a python polars approach to convert a string only containing a year/quarter into a pl.Date value?

The expected output for the string '2023-Q3', for example, would be the date July 1st, 2023. Since that date is the first day of the 3rd quarter. The expected output for the string '2023-Q3-27' would be July 27th, 2023 (the 27th day of the quarter). The expected output for the string '2023-Q3-45' would be the 45th day of the 3rd quarter of 2023 - sometime in mid-August, for example.

Using the documentation here (which also points to here) I would expect the following use of the Polars strptime function to produce a pl.Date value:

import polars as pl

date_format = "%Y-Q%q-%d"
df = pl.DataFrame({
    "quarter_str": ["2024-Q1-01", "2023-Q3-01", "2025-Q2-01"]
})

## another approach that does not work
#date_format = "%Y-Q%q"
#df = pl.DataFrame({
#    "quarter_str": ["2024-Q1", "2023-Q3", "2025-Q2"]
#})

result = df.with_columns(
    pl.col("quarter_str").str.strptime(pl.Date, format=date_format, strict=False).alias("parsed_date")
)

print(result)

I'm not sure if this is my issue, a Polars issue, or an issue with the Rust library. But it seems like the parsing of the quarter is not performing as expected. Note that neither of the above approaches work (see commented out value). At first I thought it would not assume the first day of the quarter, but passing a day %d value didn't help either.

Is there a python polars approach to convert a string only containing a year/quarter into a pl.Date value?

The expected output for the string '2023-Q3', for example, would be the date July 1st, 2023. Since that date is the first day of the 3rd quarter. The expected output for the string '2023-Q3-27' would be July 27th, 2023 (the 27th day of the quarter). The expected output for the string '2023-Q3-45' would be the 45th day of the 3rd quarter of 2023 - sometime in mid-August, for example.

Share Improve this question edited Mar 25 at 16:19 Rodalm 5,5838 silver badges21 bronze badges asked Mar 25 at 15:10 sicsmprsicsmpr 535 bronze badges 3
  • 1 What is the expected output? How do you determine the month of the date, knowing only the day of the month, the quarter and the year? That's probably the reason why the parsing fails. Or did you expect the %d to represent the day of the quarter instead of the day of the month? – Rodalm Commented Mar 25 at 15:26
  • The expected output for the string '2023-Q3', for example, would be the date July 1st, 2023. Since that date is the first day of the 3rd quarter. The expected output for the string '2023-Q3-27' would be July 27th, 2023 (the 27th day of the quarter). The expected output for the string '2023-Q3-45' would be the 45th day of the 3rd quarter of 2023 - sometime in mid-August, for example. But I realize making assumptions can be dangerous. Just passing year and month, for example, does not assume the 1st day of the month - so why should passing year and quarter assume the 1st day of the quarter. – sicsmpr Commented Mar 25 at 15:42
  • I was hoping for an 'easy button' but there does not seem to be one. – sicsmpr Commented Mar 25 at 15:43
Add a comment  | 

1 Answer 1

Reset to default 2

As far as I know, there is no direct way to parse those formats using pl.Expr.str.strptime. An alternative approach using column expressions is:

  1. Extract the year, quarter and optionally the day since the start of the quarter from the quarter_str

  2. Determine the starting month of the quarter

  3. Construct the quarter start date

  4. If quarter_str contains days at the end of the string, add them to the quarter start date

import polars as pl

df = pl.DataFrame(
    {
        "quarter_str": ["2023-Q3-01", "2023-Q3-27", "2025-Q4", "2025-Q3-45"],
    }
)

quarter = pl.col("quarter_str").str.extract("Q(\d)").cast(pl.Int8)
year = pl.col("quarter_str").str.extract("(\d{4})").cast(pl.Int16)
month = (quarter - 1) * 3 + 1
quarter_start = pl.date(year, month, 1)

has_day_of_quarter = pl.col("quarter_str").str.contains("\d{4}-Q\d-\d+")
days_since_quarter_start = (
    pl.when(has_day_of_quarter)
    .then(pl.col("quarter_str").str.extract("-(\d+)$").cast(pl.Int16) - 1)
    .otherwise(0)
)

result = df.with_columns(
    (quarter_start + pl.duration(days=days_since_quarter_start)).alias("parsed_date")
)

Output :

>>> result

shape: (4, 2)
┌─────────────┬─────────────┐
│ quarter_str ┆ parsed_date │
│ ---         ┆ ---         │
│ str         ┆ date        │
╞═════════════╪═════════════╡
│ 2023-Q3-01  ┆ 2023-07-01  │
│ 2023-Q3-27  ┆ 2023-07-27  │
│ 2025-Q4     ┆ 2025-10-01  │
│ 2025-Q3-45  ┆ 2025-08-14  │
└─────────────┴─────────────┘
发布评论

评论列表(0)

  1. 暂无评论