I have a dataframe df
.
>>> import polars as pl
>>>
>>>
>>> df = pl.DataFrame({"col": ["row1", "row2", "row3"]})
>>> df
shape: (3, 1)
┌──────┐
│ col │
│ --- │
│ str │
╞══════╡
│ row1 │
│ row2 │
│ row3 │
└──────┘
Now I want to create a new column new
. It should be the first matched pattern in the col
.
For example, For the pattern 1|2
it should produce the following output.
┌──────┬───────┐
│ col ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪═══════╡
│ row1 ┆ 1 │
│ row2 ┆ 2 │
│ row3 ┆ null │
└──────┴───────┘
I tried using with the expression API, but it's returning boolean values.
>>> df.with_columns(new=pl.col('col').str.contains("1|2"))
shape: (3, 2)
┌──────┬───────┐
│ col ┆ new │
│ --- ┆ --- │
│ str ┆ bool │
╞══════╪═══════╡
│ row1 ┆ true │
│ row2 ┆ true │
│ row3 ┆ false │
└──────┴───────┘
I have a dataframe df
.
>>> import polars as pl
>>>
>>>
>>> df = pl.DataFrame({"col": ["row1", "row2", "row3"]})
>>> df
shape: (3, 1)
┌──────┐
│ col │
│ --- │
│ str │
╞══════╡
│ row1 │
│ row2 │
│ row3 │
└──────┘
Now I want to create a new column new
. It should be the first matched pattern in the col
.
For example, For the pattern 1|2
it should produce the following output.
┌──────┬───────┐
│ col ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪═══════╡
│ row1 ┆ 1 │
│ row2 ┆ 2 │
│ row3 ┆ null │
└──────┴───────┘
I tried using with the expression API, but it's returning boolean values.
>>> df.with_columns(new=pl.col('col').str.contains("1|2"))
shape: (3, 2)
┌──────┬───────┐
│ col ┆ new │
│ --- ┆ --- │
│ str ┆ bool │
╞══════╪═══════╡
│ row1 ┆ true │
│ row2 ┆ true │
│ row3 ┆ false │
└──────┴───────┘
Share
Improve this question
edited Mar 11 at 12:22
user459872
asked Mar 11 at 12:18
user459872user459872
25.4k4 gold badges47 silver badges69 bronze badges
2 Answers
Reset to default 1You should modify your pattern to have a capturing group (...)
and use Expr.str.extract
:
df.with_columns(new=pl.col('col').str.extract("(1|2)"))
Output:
┌──────┬──────┐
│ col ┆ new │
│ --- ┆ --- │
│ str ┆ str │
╞══════╪══════╡
│ row1 ┆ 1 │
│ row2 ┆ 2 │
│ row3 ┆ null │
└──────┴──────┘
If you're not actually using regex, there's also .str.extract_many()
df.with_columns(
new = pl.col("col").str.extract_many(["1", "2"])
)
shape: (3, 2)
┌──────┬───────────┐
│ col ┆ new │
│ --- ┆ --- │
│ str ┆ list[str] │
╞══════╪═══════════╡
│ row1 ┆ ["1"] │
│ row2 ┆ ["2"] │
│ row3 ┆ [] │
└──────┴───────────┘
It's a little different as it returns a list of strings and supports overlapping=True