I'm opening a csv file using pandas.
import pandas as pd
df = pd.read_csv('/file/planned.csv')
I'm opening a file that contains about 2,000 records collected from all over the places in the world. When I'm trying to open this file with pandas, I'm getting the following errors for
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xec in position 34: invalid continuation byte
After I searched through the web, I was able to put the following encoding options hoping that I could open the file. However, I'm still getting the following error messages for each encoding options I tried.
utf-8
df_planned = pd.read_csv('/content/sample_data/planned.csv', encoding='utf-8')
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xec in position 34: invalid continuation byte
utf-16
df_planned = pd.read_csv('/content/sample_data/planned.csv', encoding='utf-16')
> UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 234-235: illegal encoding
euc-kr
df_planned = pd.read_csv('/content/sample_data/planned.csv', encoding='euc-kr')
UnicodeDecodeError: 'euc_kr' codec can't decode byte 0x84 in position 37: illegal multibyte sequence
I'm still not able to open the file into the dataframe using the pandas.
cp949
df_planned = pd.read_csv('/content/sample_data/planned.csv', encoding='cp949')
UnicodeDecodeError: 'cp949' codec can't decode byte 0xe8 in position 43: illegal multibyte sequence
Could anyone help? Thank you so much.
I'm opening a csv file using pandas.
import pandas as pd
df = pd.read_csv('/file/planned.csv')
I'm opening a file that contains about 2,000 records collected from all over the places in the world. When I'm trying to open this file with pandas, I'm getting the following errors for
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xec in position 34: invalid continuation byte
After I searched through the web, I was able to put the following encoding options hoping that I could open the file. However, I'm still getting the following error messages for each encoding options I tried.
utf-8
df_planned = pd.read_csv('/content/sample_data/planned.csv', encoding='utf-8')
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xec in position 34: invalid continuation byte
utf-16
df_planned = pd.read_csv('/content/sample_data/planned.csv', encoding='utf-16')
> UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 234-235: illegal encoding
euc-kr
df_planned = pd.read_csv('/content/sample_data/planned.csv', encoding='euc-kr')
UnicodeDecodeError: 'euc_kr' codec can't decode byte 0x84 in position 37: illegal multibyte sequence
I'm still not able to open the file into the dataframe using the pandas.
cp949
df_planned = pd.read_csv('/content/sample_data/planned.csv', encoding='cp949')
UnicodeDecodeError: 'cp949' codec can't decode byte 0xe8 in position 43: illegal multibyte sequence
Could anyone help? Thank you so much.
Share Improve this question asked 13 hours ago headfatheadfat 852 silver badges10 bronze badges 6 | Show 1 more comment1 Answer
Reset to default 0You will have to find encoding of CSV file first with the following:
import chardet
import pandas as pd
with open('your_file.csv', 'rb') as f:
enc = chardet.detect(f.read()) # or readline if the file is large
encoding = enc['encoding']
Once you know the encoding then you can use your method to read the file with.
df_planned = pd.read_csv('/content/sample_data/planned.csv', encoding=encoding)
(replace with found ending)
open("file", "rb")
will open the file in binary mode. Then if you have a line inline
, you can try to decode it withline.decode(encoding)
for various encodings, and see which encodings give you exceptions or not. – joanis Commented 1 hour ago