The DataFrame is generated using pd.read_excel. The index values are date which I want to convert to datetime but I am not sure if the approach I found is the "best" one. My issue is that the string date format has month in english abbreviated name. So in my case, "2024-02-10" is "Feb 10, 2024". Therefore, I cannot use datetime.strptime(date_string, "%b %d, %Y")
because the month abbreviated name depends on locale and I dont want to use locale.setlocale()
.
The two options I found to be able to do that are:
- Using arrow library:
import arrow
date_arrow = arrow.get("Feb 10, 2024", "MMM D, YYYY", locale="en_US").datetime
- Using dateparser library:
import dateparser
date_parsed = dateparser.parse("Feb 26, 1971")
To to have the DataFrame as I want, I can do either:
import pandas as pd
import arrow
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0)
data.index.map(lambda x: arrow.get(x, "MMM D, YYYY", locale="en_US").datetime)
or
import pandas as pd
import dateparser
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0)
data.index.map(lambda x: dateparser.parse(x))
Both work but datepaser
seems a bit slower. Now my questions are the followings:
Is it correct that I cannot do that using python standard library without relying on
locale
?More generally, is there some better/cleaner alternative to achieve this?
Bonus question: I have been trying to do this conversion directly within pd.read_excel but nothing is being converted... I do not get why. There is not error, it just seems to not be doing anything
import pandas as pd
import arrow
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0, converters={0: lambda x: arrow.get(x, "MMM D, YYYY", locale="en_US").datetime})
The DataFrame is generated using pd.read_excel. The index values are date which I want to convert to datetime but I am not sure if the approach I found is the "best" one. My issue is that the string date format has month in english abbreviated name. So in my case, "2024-02-10" is "Feb 10, 2024". Therefore, I cannot use datetime.strptime(date_string, "%b %d, %Y")
because the month abbreviated name depends on locale and I dont want to use locale.setlocale()
.
The two options I found to be able to do that are:
- Using arrow library:
import arrow
date_arrow = arrow.get("Feb 10, 2024", "MMM D, YYYY", locale="en_US").datetime
- Using dateparser library:
import dateparser
date_parsed = dateparser.parse("Feb 26, 1971")
To to have the DataFrame as I want, I can do either:
import pandas as pd
import arrow
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0)
data.index.map(lambda x: arrow.get(x, "MMM D, YYYY", locale="en_US").datetime)
or
import pandas as pd
import dateparser
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0)
data.index.map(lambda x: dateparser.parse(x))
Both work but datepaser
seems a bit slower. Now my questions are the followings:
Is it correct that I cannot do that using python standard library without relying on
locale
?More generally, is there some better/cleaner alternative to achieve this?
Bonus question: I have been trying to do this conversion directly within pd.read_excel but nothing is being converted... I do not get why. There is not error, it just seems to not be doing anything
import pandas as pd
import arrow
path = r"some_path"
data = pd.read_excel(path, header=[0, 1, 2, 3], index_col=0, converters={0: lambda x: arrow.get(x, "MMM D, YYYY", locale="en_US").datetime})
Share
Improve this question
asked Feb 6 at 20:35
AristideAristide
114 bronze badges
1 Answer
Reset to default 0Have you tried pandas.to_datetime()? I suggest you just convert your string index/column to datetime using:
import pandas as pd
test_string = "Feb 10, 2024"
test_result = pd.to_datetime(test_string)
print(test_result)
Output:
2024-02-10 00:00:00
If you need to, you can change it to another string format using strftime.