最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Can't get grouped data into numpy array - Stack Overflow

programmeradmin2浏览0评论

I have a CSV file like this:

Ngày(Date),Số(Number)
07/03/2025,8
07/03/2025,9
...
06/03/2025,6
06/03/2025,10
06/03/2025,18
06/03/2025,14
...

(Each day has 27 numbers)

I want to predict a list of 27 numbers on the next day using LSTM. It keeps getting an error on this step:

data_matrix = np.array(grouped_data.loc[:, "Số"].tolist())

with

KeyError: 'Số'

(which means 'Number')

Here is my code:

import numpy as np
import pandas as pd

df = pd.read_csv("C:/Users/Admin/lonum_fixed.csv", encoding="utf-8", sep=",")
df.columns = df.columns.str.strip()

grouped_data = df.groupby("Ngày")[["Số"]].apply(lambda x: list(map(int, x["Số"].values))).reset_index()
grouped_data["Số"] = grouped_data["Số"].apply(lambda x: eval(x) if isinstance(x, str) else x)

data_matrix = np.array(grouped_data.loc[:, "Số"].tolist())

I have a CSV file like this:

Ngày(Date),Số(Number)
07/03/2025,8
07/03/2025,9
...
06/03/2025,6
06/03/2025,10
06/03/2025,18
06/03/2025,14
...

(Each day has 27 numbers)

I want to predict a list of 27 numbers on the next day using LSTM. It keeps getting an error on this step:

data_matrix = np.array(grouped_data.loc[:, "Số"].tolist())

with

KeyError: 'Số'

(which means 'Number')

Here is my code:

import numpy as np
import pandas as pd

df = pd.read_csv("C:/Users/Admin/lonum_fixed.csv", encoding="utf-8", sep=",")
df.columns = df.columns.str.strip()

grouped_data = df.groupby("Ngày")[["Số"]].apply(lambda x: list(map(int, x["Số"].values))).reset_index()
grouped_data["Số"] = grouped_data["Số"].apply(lambda x: eval(x) if isinstance(x, str) else x)

data_matrix = np.array(grouped_data.loc[:, "Số"].tolist())
Share Improve this question edited Mar 9 at 16:00 desertnaut 60.5k32 gold badges155 silver badges182 bronze badges asked Mar 9 at 15:52 gialociubcgialociubc 32 bronze badges 6
  • maybe check print( grouped_data ) and print( grouped_data.columns ) – furas Commented Mar 9 at 15:59
  • Also, check the normalization of Số. It can be represented by two Unicode characters or four: 'S\u1ed1' or 'So\u0302\u0301'. Use the ascii() function. – Mark Tolonen Commented Mar 9 at 16:02
  • 1 line with df.groupby("Ngày")[["Số"]]... gives me DataFrame without name "Số" but 0 - so grouped_data doesn't have "Số". And it raises error in grouped_data["Số"].apply(...), not in grouped_data.loc[:, "Số"] – furas Commented Mar 9 at 16:07
  • 1 If "Số" is already a list, modify groupby grouped_data = df.groupby("Ngày")["Số"].apply(list).reset_index() – steve-ed Commented Mar 9 at 16:07
  • first: after reading file I get column "Số" with integer values - you can check print(df.dtypes) - and it doesn't need list(map(int, x["Số"].values) – furas Commented Mar 9 at 16:15
 |  Show 1 more comment

1 Answer 1

Reset to default 0

First: when it reads data then it should convert values to integers so there is no need to use map(int, ...). And apply( ...list ...) creates lists so there is no need to use eval().


Problem is because groupby().apply() created DataFrame with name 0 instead of "Số"and later it raised error in grouped_data["Số"].apply(...), not grouped_data.loc[:, "Số"]

You can reduce code to

grouped_data = df.groupby("Ngày")["Số"].apply(list).reset_index(name="Số")

which will convert to list and set name "Số" again. I uses ["Số"] instead of [["Số"]]

Because pandas keep data as numpy.array so you can get

data_matrix = grouped_data["Số"].values

Full code used for tests:

I used io.StringIO only to create file-like object in memory - so everyone can simply copy and run it - but you can use filename.

import numpy as np
import pandas as pd


text = '''Ngày,Số
07/03/2025,8
07/03/2025,9
06/03/2025,6
06/03/2025,10
06/03/2025,18
06/03/2025,14
'''

import io

df = pd.read_csv(io.StringIO(text), encoding="utf-8", sep=",")
#df = pd.read_csv("C:/Users/Admin/lonum_fixed.csv", encoding="utf-8", sep=",")
df.columns = df.columns.str.strip()
print('----')
print(df)
print('----')
print(df.dtypes)

grouped_data = df.groupby("Ngày")["Số"].apply(list).reset_index(name="Số")
print('---')
print(grouped_data)
print('----')
print('type:', type(grouped_data))

print('---')
print('type:', type(grouped_data["Số"].values))
print('----')
print('values  :', grouped_data["Số"].values)
print('np.array:', np.array(grouped_data["Số"]))

data_matrix = grouped_data["Số"].values
#data_matrix = np.array(grouped_data["Số"])

print('----')
print('data_matrix:', data_matrix)

Result:

----
         Ngày  Số
0  07/03/2025   8
1  07/03/2025   9
2  06/03/2025   6
3  06/03/2025  10
4  06/03/2025  18
5  06/03/2025  14
----
Ngày    object
Số       int64
dtype: object
---
         Ngày               Số
0  06/03/2025  [6, 10, 18, 14]
1  07/03/2025           [8, 9]
----
type: <class 'pandas.core.frame.DataFrame'>
---
type: <class 'numpy.ndarray'>
----
values  : [list([6, 10, 18, 14]) list([8, 9])]
np.array: [list([6, 10, 18, 14]) list([8, 9])]
----
data_matrix: [list([6, 10, 18, 14]) list([8, 9])]
发布评论

评论列表(0)

  1. 暂无评论