I have two python environments - one online through a class, and the other on my own computer.
The following code works on the online environment, but gives an error on my local environment. Does anyone know what this error means, and have any suggestions for fixing my environment, or fixing my code? The online environment has a habit of losing my work, so I'd like to get this working on my own machine.
This is the code
custom_categories = ['cat_a', 'cat_b', 'cat_c', 'Other']
custom_categories_filter = [
(df['column_name'].str.contains('(A)', regex = False)),
(df['column_name'].str.contains('(B)', regex = False)),
(df['column_name'].str.contains('(C)', regex = False)),
(df['column_name'].str.contains('(A)', regex = False) == False)
& (df['column_name'].str.contains('(B)', regex = False) == False)
& (df['column_name'].str.contains('(C)', regex = False) == False)
]
df["custom_category"] = numpy.select(custom_categories_filter, custom_categories)
It's intended to look through a column of a pandas data frame, search for certain terms in brackets, then put a value based on that term into a new column.
This is the error:
TypeError: Choicelist and default value do not have a common dtype:
The DType <class 'numpy.dtypes._PyLongDType'> could not be promoted by <class 'numpy.dtypes.StrDType'>.
This means that no common DType exists for the given inputs.
For example they cannot be stored in a single array unless the dtype is `object`.
The full list of DTypes is: (<class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes._PyLongDType'>)
Online environment is Python 3.11.11, local is 3.10.8 - could it be the python version I'm using?
I have two python environments - one online through a class, and the other on my own computer.
The following code works on the online environment, but gives an error on my local environment. Does anyone know what this error means, and have any suggestions for fixing my environment, or fixing my code? The online environment has a habit of losing my work, so I'd like to get this working on my own machine.
This is the code
custom_categories = ['cat_a', 'cat_b', 'cat_c', 'Other']
custom_categories_filter = [
(df['column_name'].str.contains('(A)', regex = False)),
(df['column_name'].str.contains('(B)', regex = False)),
(df['column_name'].str.contains('(C)', regex = False)),
(df['column_name'].str.contains('(A)', regex = False) == False)
& (df['column_name'].str.contains('(B)', regex = False) == False)
& (df['column_name'].str.contains('(C)', regex = False) == False)
]
df["custom_category"] = numpy.select(custom_categories_filter, custom_categories)
It's intended to look through a column of a pandas data frame, search for certain terms in brackets, then put a value based on that term into a new column.
This is the error:
TypeError: Choicelist and default value do not have a common dtype:
The DType <class 'numpy.dtypes._PyLongDType'> could not be promoted by <class 'numpy.dtypes.StrDType'>.
This means that no common DType exists for the given inputs.
For example they cannot be stored in a single array unless the dtype is `object`.
The full list of DTypes is: (<class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes.StrDType'>, <class 'numpy.dtypes._PyLongDType'>)
Online environment is Python 3.11.11, local is 3.10.8 - could it be the python version I'm using?
Share asked Mar 14 at 14:46 SophiaSophia 5,82511 gold badges40 silver badges43 bronze badges1 Answer
Reset to default 1The signature of np.select
is as follows:
numpy.select(condlist, choicelist, default=0)
That is, it defaults to the integer 0
. Your choicelist
(custom_categories
) contains strings, and in numpy >= 2.0
, the library enforces stricter dtype rules following NEP 50.
To fix this, you can set default=''
. However, it's better to use 'Other'
and adjust your logic to match the intended use of the function:
import pandas as pd
import numpy as np
df = pd.DataFrame(data=['A(A)', 'A(B)', 'A(C)', 'A(D)'],
columns=['column_name'])
custom_categories = ['cat_a', 'cat_b', 'cat_c']
custom_categories_filter = [
df['column_name'].str.contains('(A)', regex=False),
df['column_name'].str.contains('(B)', regex=False),
df['column_name'].str.contains('(C)', regex=False)
]
df['custom_category'] = np.select(condlist=custom_categories_filter,
choicelist=custom_categories,
default='Other')
Output:
column_name custom_category
0 A(A) cat_a
1 A(B) cat_b
2 A(C) cat_c
3 A(D) Other
Example previous behaviour (numpy <= 1.26
):
df['custom_category'] = np.select(condlist=custom_categories_filter,
choicelist=custom_categories)
Output:
column_name custom_category
0 A(A) cat_a
1 A(B) cat_b
2 A(C) cat_c
3 A(D) 0 # '0' cast as string!
In numpy >= 2.0
such coercion of default=0
to a string is no longer allowed.
For specifics on the change, compare 1.26
and 2.0
implementations of np.select
.