最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - pandas 2.2.3: `astype(str)` returns `np.str_` - Stack Overflow

programmeradmin1浏览0评论

Latest pandas version casts types into np types. To cast a series of integer to strings I thought astype(str) would have been enough:

import pandas as pd
import numpy as np


list_of_str = list(pd.Series([1234, 123, 345]).to_frame()[0].unique().astype(str))
list_of_str

returns [np.str_('1234'), np.str_('123'), np.str_('345')].

And also

list_of_str = list(pd.Series([1234, 123, 345]).to_frame()[0].unique().astype(np.str_))
list_of_str

returns [np.str_('1234'), np.str_('123'), np.str_('345')].

Is there an efficient way to cast to python type string, that does not require list comprehension, like:

list_of_str = [str(s) for s in list_of_str]
list_of_str

that finally returns ['1234', '123', '345'] ?

Latest pandas version casts types into np types. To cast a series of integer to strings I thought astype(str) would have been enough:

import pandas as pd
import numpy as np


list_of_str = list(pd.Series([1234, 123, 345]).to_frame()[0].unique().astype(str))
list_of_str

returns [np.str_('1234'), np.str_('123'), np.str_('345')].

And also

list_of_str = list(pd.Series([1234, 123, 345]).to_frame()[0].unique().astype(np.str_))
list_of_str

returns [np.str_('1234'), np.str_('123'), np.str_('345')].

Is there an efficient way to cast to python type string, that does not require list comprehension, like:

list_of_str = [str(s) for s in list_of_str]
list_of_str

that finally returns ['1234', '123', '345'] ?

Share Improve this question asked 2 days ago SeFSeF 4,1883 gold badges30 silver badges43 bronze badges 2
  • 1 Why do you convert to_frame then back to Series: .to_frame()[0]? – mozway Commented 2 days ago
  • No particular reason, likely excess of zeal. Same results can be obtained removing the [0]. – SeF Commented yesterday
Add a comment  | 

2 Answers 2

Reset to default 1

If you want python objects, there is no "efficient" way to perform conversions. Strings in numpy are not necessarily handled efficiently compared to numeric data.

The list comprehension a perfectly valid option. Alternatively:

list(map(str, pd.Series([1234, 123, 345]).unique()))

Or with drop_duplicates in place of unique:

pd.Series([1234, 123, 345]).drop_duplicates().astype(str).tolist()

Output:

['1234', '123', '345']

Comparison of speeds, working with strings in numpy is not faster than loops:

# initializing an array with 1M items
a = np.arange(1_000_000)

%timeit a.astype(str).tolist()
277 ms ± 95.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit [str(x) for x in a]
194 ms ± 2.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit list(map(str, a))
159 ms ± 2.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

It's because:

series.unique()

Returns a numpy.ndarray object, and using .astype(str) with numpy returns some numpy text type:

>>> pd.Series([1234, 123, 345]).unique()
array([1234,  123,  345])
>>> np.array([1234, 123, 345]).astype(str)
array(['1234', '123', '345'], dtype='<U21')

According to the numpy docs the python type str will be converted to to scalar type np.str_, so some array with dtype('U<length>') will result.

发布评论

评论列表(0)

  1. 暂无评论