最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python 3.x - DataFrame index: date formating to datetime - Stack Overflow

programmeradmin1浏览0评论

The DataFrame is generated using pd.read_excel. The index values are date which I want to convert to datetime but I am not sure if the approach I found is the "best" one. My issue is that the string date format has month in english abbreviated name. So in my case, "2024-02-10" is "Feb 10, 2024". Therefore, I cannot use datetime.strptime(date_string, "%b %d, %Y") because the month abbreviated name depends on locale and I dont want to use locale.setlocale().

The two options I found to be able to do that are:

  • Using arrow library:
import arrow
date_arrow = arrow.get("Feb 10, 2024", "MMM D, YYYY", locale="en_US").datetime
  • Using dateparser library:
import dateparser
date_parsed = dateparser.parse("Feb 26, 1971")

To to have the DataFrame as I want, I can do either:

import pandas as pd
import arrow
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0)
data.index.map(lambda x: arrow.get(x, "MMM D, YYYY", locale="en_US").datetime)

or

import pandas as pd
import dateparser
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0)
data.index.map(lambda x: dateparser.parse(x))

Both work but datepaser seems a bit slower. Now my questions are the followings:

  • Is it correct that I cannot do that using python standard library without relying on locale?

  • More generally, is there some better/cleaner alternative to achieve this?

  • Bonus question: I have been trying to do this conversion directly within pd.read_excel but nothing is being converted... I do not get why. There is not error, it just seems to not be doing anything

import pandas as pd
import arrow
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0, converters={0: lambda x: arrow.get(x, "MMM D, YYYY", locale="en_US").datetime})

The DataFrame is generated using pd.read_excel. The index values are date which I want to convert to datetime but I am not sure if the approach I found is the "best" one. My issue is that the string date format has month in english abbreviated name. So in my case, "2024-02-10" is "Feb 10, 2024". Therefore, I cannot use datetime.strptime(date_string, "%b %d, %Y") because the month abbreviated name depends on locale and I dont want to use locale.setlocale().

The two options I found to be able to do that are:

  • Using arrow library:
import arrow
date_arrow = arrow.get("Feb 10, 2024", "MMM D, YYYY", locale="en_US").datetime
  • Using dateparser library:
import dateparser
date_parsed = dateparser.parse("Feb 26, 1971")

To to have the DataFrame as I want, I can do either:

import pandas as pd
import arrow
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0)
data.index.map(lambda x: arrow.get(x, "MMM D, YYYY", locale="en_US").datetime)

or

import pandas as pd
import dateparser
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0)
data.index.map(lambda x: dateparser.parse(x))

Both work but datepaser seems a bit slower. Now my questions are the followings:

  • Is it correct that I cannot do that using python standard library without relying on locale?

  • More generally, is there some better/cleaner alternative to achieve this?

  • Bonus question: I have been trying to do this conversion directly within pd.read_excel but nothing is being converted... I do not get why. There is not error, it just seems to not be doing anything

import pandas as pd
import arrow
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0, converters={0: lambda x: arrow.get(x, "MMM D, YYYY", locale="en_US").datetime})
Share Improve this question asked Feb 6 at 20:35 AristideAristide 114 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

Have you tried pandas.to_datetime()? I suggest you just convert your string index/column to datetime using:

import pandas as pd

test_string = "Feb 10, 2024"

test_result = pd.to_datetime(test_string)

print(test_result)

Output:

2024-02-10 00:00:00

If you need to, you can change it to another string format using strftime.

发布评论

评论列表(0)

  1. 暂无评论