python - Can't get grouped data into numpy array

I have a CSV file like this:

Ngày(Date),Số(Number)
07/03/2025,8
07/03/2025,9
...
06/03/2025,6
06/03/2025,10
06/03/2025,18
06/03/2025,14
...

(Each day has 27 numbers)

I want to predict a list of 27 numbers on the next day using LSTM. It keeps getting an error on this step:

data_matrix = np.array(grouped_data.loc[:, "Số"].tolist())

with

KeyError: 'Số'

(which means 'Number')

Here is my code:

import numpy as np
import pandas as pd

df = pd.read_csv("C:/Users/Admin/lonum_fixed.csv", encoding="utf-8", sep=",")
df.columns = df.columns.str.strip()

grouped_data = df.groupby("Ngày")[["Số"]].apply(lambda x: list(map(int, x["Số"].values))).reset_index()
grouped_data["Số"] = grouped_data["Số"].apply(lambda x: eval(x) if isinstance(x, str) else x)

data_matrix = np.array(grouped_data.loc[:, "Số"].tolist())

I have a CSV file like this:

Ngày(Date),Số(Number)
07/03/2025,8
07/03/2025,9
...
06/03/2025,6
06/03/2025,10
06/03/2025,18
06/03/2025,14
...

(Each day has 27 numbers)

I want to predict a list of 27 numbers on the next day using LSTM. It keeps getting an error on this step:

data_matrix = np.array(grouped_data.loc[:, "Số"].tolist())

with

KeyError: 'Số'

(which means 'Number')

Here is my code:

import numpy as np
import pandas as pd

df = pd.read_csv("C:/Users/Admin/lonum_fixed.csv", encoding="utf-8", sep=",")
df.columns = df.columns.str.strip()

grouped_data = df.groupby("Ngày")[["Số"]].apply(lambda x: list(map(int, x["Số"].values))).reset_index()
grouped_data["Số"] = grouped_data["Số"].apply(lambda x: eval(x) if isinstance(x, str) else x)

data_matrix = np.array(grouped_data.loc[:, "Số"].tolist())

Share Improve this question edited Mar 9 at 16:00 desertnaut 60.5k32 gold badges155 silver badges182 bronze badges asked Mar 9 at 15:52 gialociubc 32 bronze badges

maybe check print( grouped_data ) and print( grouped_data.columns ) – furas Commented Mar 9 at 15:59
Also, check the normalization of Số. It can be represented by two Unicode characters or four: 'S\u1ed1' or 'So\u0302\u0301'. Use the ascii() function. – Mark Tolonen Commented Mar 9 at 16:02
1 line with df.groupby("Ngày")[["Số"]]... gives me DataFrame without name "Số" but 0 - so grouped_data doesn't have "Số". And it raises error in grouped_data["Số"].apply(...), not in grouped_data.loc[:, "Số"] – furas Commented Mar 9 at 16:07
1 If "Số" is already a list, modify groupby grouped_data = df.groupby("Ngày")["Số"].apply(list).reset_index() – steve-ed Commented Mar 9 at 16:07
first: after reading file I get column "Số" with integer values - you can check print(df.dtypes) - and it doesn't need list(map(int, x["Số"].values) – furas Commented Mar 9 at 16:15

| Show 1 more comment

1 Answer 1

Sorted by: Reset to default 0

First: when it reads data then it should convert values to integers so there is no need to use map(int, ...). And apply( ...list ...) creates lists so there is no need to use eval().

Problem is because groupby().apply() created DataFrame with name 0 instead of "Số"and later it raised error in grouped_data["Số"].apply(...), not grouped_data.loc[:, "Số"]

You can reduce code to

grouped_data = df.groupby("Ngày")["Số"].apply(list).reset_index(name="Số")

which will convert to list and set name "Số" again. I uses ["Số"] instead of [["Số"]]

Because pandas keep data as numpy.array so you can get

data_matrix = grouped_data["Số"].values

Full code used for tests:

I used io.StringIO only to create file-like object in memory - so everyone can simply copy and run it - but you can use filename.

import numpy as np
import pandas as pd


text = '''Ngày,Số
07/03/2025,8
07/03/2025,9
06/03/2025,6
06/03/2025,10
06/03/2025,18
06/03/2025,14
'''

import io

df = pd.read_csv(io.StringIO(text), encoding="utf-8", sep=",")
#df = pd.read_csv("C:/Users/Admin/lonum_fixed.csv", encoding="utf-8", sep=",")
df.columns = df.columns.str.strip()
print('----')
print(df)
print('----')
print(df.dtypes)

grouped_data = df.groupby("Ngày")["Số"].apply(list).reset_index(name="Số")
print('---')
print(grouped_data)
print('----')
print('type:', type(grouped_data))

print('---')
print('type:', type(grouped_data["Số"].values))
print('----')
print('values  :', grouped_data["Số"].values)
print('np.array:', np.array(grouped_data["Số"]))

data_matrix = grouped_data["Số"].values
#data_matrix = np.array(grouped_data["Số"])

print('----')
print('data_matrix:', data_matrix)

Result:

----
         Ngày  Số
0  07/03/2025   8
1  07/03/2025   9
2  06/03/2025   6
3  06/03/2025  10
4  06/03/2025  18
5  06/03/2025  14
----
Ngày    object
Số       int64
dtype: object
---
         Ngày               Số
0  06/03/2025  [6, 10, 18, 14]
1  07/03/2025           [8, 9]
----
type: <class 'pandas.core.frame.DataFrame'>
---
type: <class 'numpy.ndarray'>
----
values  : [list([6, 10, 18, 14]) list([8, 9])]
np.array: [list([6, 10, 18, 14]) list([8, 9])]
----
data_matrix: [list([6, 10, 18, 14]) list([8, 9])]

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Can't get grouped data into numpy array - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)