最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - How to include first matching pattern as a column - Stack Overflow

programmeradmin2浏览0评论

I have a dataframe df.

>>> import polars as pl
>>> 
>>> 
>>> df = pl.DataFrame({"col": ["row1", "row2", "row3"]})
>>> df
shape: (3, 1)
┌──────┐
│ col  │
│ ---  │
│ str  │
╞══════╡
│ row1 │
│ row2 │
│ row3 │
└──────┘

Now I want to create a new column new. It should be the first matched pattern in the col.

For example, For the pattern 1|2 it should produce the following output.

┌──────┬───────┐
│ col  ┆ new   │
│ ---  ┆ ---   │
│ str  ┆ str   │
╞══════╪═══════╡
│ row1 ┆ 1     │
│ row2 ┆ 2     │
│ row3 ┆ null  │
└──────┴───────┘

I tried using with the expression API, but it's returning boolean values.

>>> df.with_columns(new=pl.col('col').str.contains("1|2"))
shape: (3, 2)
┌──────┬───────┐
│ col  ┆ new   │
│ ---  ┆ ---   │
│ str  ┆ bool  │
╞══════╪═══════╡
│ row1 ┆ true  │
│ row2 ┆ true  │
│ row3 ┆ false │
└──────┴───────┘

I have a dataframe df.

>>> import polars as pl
>>> 
>>> 
>>> df = pl.DataFrame({"col": ["row1", "row2", "row3"]})
>>> df
shape: (3, 1)
┌──────┐
│ col  │
│ ---  │
│ str  │
╞══════╡
│ row1 │
│ row2 │
│ row3 │
└──────┘

Now I want to create a new column new. It should be the first matched pattern in the col.

For example, For the pattern 1|2 it should produce the following output.

┌──────┬───────┐
│ col  ┆ new   │
│ ---  ┆ ---   │
│ str  ┆ str   │
╞══════╪═══════╡
│ row1 ┆ 1     │
│ row2 ┆ 2     │
│ row3 ┆ null  │
└──────┴───────┘

I tried using with the expression API, but it's returning boolean values.

>>> df.with_columns(new=pl.col('col').str.contains("1|2"))
shape: (3, 2)
┌──────┬───────┐
│ col  ┆ new   │
│ ---  ┆ ---   │
│ str  ┆ bool  │
╞══════╪═══════╡
│ row1 ┆ true  │
│ row2 ┆ true  │
│ row3 ┆ false │
└──────┴───────┘
Share Improve this question edited Mar 11 at 12:22 user459872 asked Mar 11 at 12:18 user459872user459872 25.4k4 gold badges47 silver badges69 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 1

You should modify your pattern to have a capturing group (...) and use Expr.str.extract:

df.with_columns(new=pl.col('col').str.extract("(1|2)"))

Output:

┌──────┬──────┐
│ col  ┆ new  │
│ ---  ┆ ---  │
│ str  ┆ str  │
╞══════╪══════╡
│ row1 ┆ 1    │
│ row2 ┆ 2    │
│ row3 ┆ null │
└──────┴──────┘

If you're not actually using regex, there's also .str.extract_many()

df.with_columns(
    new = pl.col("col").str.extract_many(["1", "2"])
)
shape: (3, 2)
┌──────┬───────────┐
│ col  ┆ new       │
│ ---  ┆ ---       │
│ str  ┆ list[str] │
╞══════╪═══════════╡
│ row1 ┆ ["1"]     │
│ row2 ┆ ["2"]     │
│ row3 ┆ []        │
└──────┴───────────┘

It's a little different as it returns a list of strings and supports overlapping=True

发布评论

评论列表(0)

  1. 暂无评论