python 3.x - DataFrame index: date formating to datetime

The DataFrame is generated using pd.read_excel. The index values are date which I want to convert to datetime but I am not sure if the approach I found is the "best" one. My issue is that the string date format has month in english abbreviated name. So in my case, "2024-02-10" is "Feb 10, 2024". Therefore, I cannot use datetime.strptime(date_string, "%b %d, %Y") because the month abbreviated name depends on locale and I dont want to use locale.setlocale().

The two options I found to be able to do that are:

Using arrow library:

import arrow
date_arrow = arrow.get("Feb 10, 2024", "MMM D, YYYY", locale="en_US").datetime

Using dateparser library:

import dateparser
date_parsed = dateparser.parse("Feb 26, 1971")

To to have the DataFrame as I want, I can do either:

import pandas as pd
import arrow
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0)
data.index.map(lambda x: arrow.get(x, "MMM D, YYYY", locale="en_US").datetime)

import pandas as pd
import dateparser
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0)
data.index.map(lambda x: dateparser.parse(x))

Both work but datepaser seems a bit slower. Now my questions are the followings:

Is it correct that I cannot do that using python standard library without relying on locale?
More generally, is there some better/cleaner alternative to achieve this?
Bonus question: I have been trying to do this conversion directly within pd.read_excel but nothing is being converted... I do not get why. There is not error, it just seems to not be doing anything

import pandas as pd
import arrow
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0, converters={0: lambda x: arrow.get(x, "MMM D, YYYY", locale="en_US").datetime})

The two options I found to be able to do that are:

Using arrow library:

import arrow
date_arrow = arrow.get("Feb 10, 2024", "MMM D, YYYY", locale="en_US").datetime

Using dateparser library:

import dateparser
date_parsed = dateparser.parse("Feb 26, 1971")

To to have the DataFrame as I want, I can do either:

import pandas as pd
import arrow
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0)
data.index.map(lambda x: arrow.get(x, "MMM D, YYYY", locale="en_US").datetime)

import pandas as pd
import dateparser
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0)
data.index.map(lambda x: dateparser.parse(x))

Both work but datepaser seems a bit slower. Now my questions are the followings:

Is it correct that I cannot do that using python standard library without relying on locale?
More generally, is there some better/cleaner alternative to achieve this?
Bonus question: I have been trying to do this conversion directly within pd.read_excel but nothing is being converted... I do not get why. There is not error, it just seems to not be doing anything

import pandas as pd
import arrow
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0, converters={0: lambda x: arrow.get(x, "MMM D, YYYY", locale="en_US").datetime})

Share Improve this question asked Feb 6 at 20:35 Aristide 114 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

Have you tried pandas.to_datetime()? I suggest you just convert your string index/column to datetime using:

import pandas as pd

test_string = "Feb 10, 2024"

test_result = pd.to_datetime(test_string)

print(test_result)

Output:

2024-02-10 00:00:00

If you need to, you can change it to another string format using strftime.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python 3.x - DataFrame index: date formating to datetime - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)