I have a dictionary of lists, each with a key string value (stock ticker) and value consisting of a list of dicts which looks like this:
data
Out[88]:
{'NVDA': [{'open': 144.75, 'high': 144.21, 'low': 174.33, 'close': 210.47},
{'open': 123.97, 'high': 128.5, 'low': 110.25, 'close': 154.09},
{'open': 118.19, 'high': 134.81, 'low': 104.37, 'close': 149.72},
{'open': 225.35, 'high': 126.81, 'low': 104.77, 'close': 209.46},
{'open': 247.2, 'high': 243.25, 'low': 220.44, 'close': 186.01}],
'MSFT': [{'open': 175.78, 'high': 213.98, 'low': 229.75, 'close': 206.59},
{'open': 142.98, 'high': 168.42, 'low': 188.33, 'close': 232.52},
{'open': 184.14, 'high': 163.42, 'low': 194.81, 'close': 153.03},
{'open': 199.54, 'high': 130.26, 'low': 101.05, 'close': 102.1},
{'open': 243.91, 'high': 119.21, 'low': 190.2, 'close': 223.31}],
'AAPL': [{'open': 202.06, 'high': 162.54, 'low': 212.3, 'close': 226.78},
{'open': 191.17, 'high': 153.49, 'low': 135.13, 'close': 151.83},
{'open': 187.15, 'high': 149.75, 'low': 123.28, 'close': 247.32},
{'open': 194.29, 'high': 175.34, 'low': 244.14, 'close': 207.45},
{'open': 228.9, 'high': 133.26, 'low': 100.59, 'close': 129.35}]}
ticks = ['NVDA', 'MSFT', 'AAPL']
data = {}
for s in ticks:
data[s] = []
for _ in range(5):
entry = {
'open': round(random.uniform(100, 250), 2),
'high': round(random.uniform(100, 250), 2),
'low': round(random.uniform(100, 250), 2),
'close': round(random.uniform(100, 250), 2)
}
data[s].append(entry)
I'd like to convert this to a dataframe which looks like this:
df
Out[98]:
tick open high low close
0 NVDA 215.44 124.29 121.61 244.35
1 NVDA 214.89 184.49 157.39 239.31
2 NVDA 221.42 204.17 148.83 215.00
3 NVDA 182.49 104.29 175.36 226.59
4 NVDA 127.31 182.31 228.92 173.52
5 MSFT 217.79 147.98 120.40 239.97
6 MSFT 108.66 222.83 177.20 172.62
7 MSFT 138.16 116.36 241.62 231.15
8 MSFT 160.53 234.88 154.93 127.49
9 MSFT 168.22 127.77 224.75 207.59
10 AAPL 119.95 106.36 150.28 195.93
11 AAPL 117.71 142.54 210.08 116.37
12 AAPL 147.07 204.46 223.98 104.91
13 AAPL 135.71 211.83 210.11 102.34
14 AAPL 216.45 136.08 130.27 236.48
I have a dictionary of lists, each with a key string value (stock ticker) and value consisting of a list of dicts which looks like this:
data
Out[88]:
{'NVDA': [{'open': 144.75, 'high': 144.21, 'low': 174.33, 'close': 210.47},
{'open': 123.97, 'high': 128.5, 'low': 110.25, 'close': 154.09},
{'open': 118.19, 'high': 134.81, 'low': 104.37, 'close': 149.72},
{'open': 225.35, 'high': 126.81, 'low': 104.77, 'close': 209.46},
{'open': 247.2, 'high': 243.25, 'low': 220.44, 'close': 186.01}],
'MSFT': [{'open': 175.78, 'high': 213.98, 'low': 229.75, 'close': 206.59},
{'open': 142.98, 'high': 168.42, 'low': 188.33, 'close': 232.52},
{'open': 184.14, 'high': 163.42, 'low': 194.81, 'close': 153.03},
{'open': 199.54, 'high': 130.26, 'low': 101.05, 'close': 102.1},
{'open': 243.91, 'high': 119.21, 'low': 190.2, 'close': 223.31}],
'AAPL': [{'open': 202.06, 'high': 162.54, 'low': 212.3, 'close': 226.78},
{'open': 191.17, 'high': 153.49, 'low': 135.13, 'close': 151.83},
{'open': 187.15, 'high': 149.75, 'low': 123.28, 'close': 247.32},
{'open': 194.29, 'high': 175.34, 'low': 244.14, 'close': 207.45},
{'open': 228.9, 'high': 133.26, 'low': 100.59, 'close': 129.35}]}
ticks = ['NVDA', 'MSFT', 'AAPL']
data = {}
for s in ticks:
data[s] = []
for _ in range(5):
entry = {
'open': round(random.uniform(100, 250), 2),
'high': round(random.uniform(100, 250), 2),
'low': round(random.uniform(100, 250), 2),
'close': round(random.uniform(100, 250), 2)
}
data[s].append(entry)
I'd like to convert this to a dataframe which looks like this:
df
Out[98]:
tick open high low close
0 NVDA 215.44 124.29 121.61 244.35
1 NVDA 214.89 184.49 157.39 239.31
2 NVDA 221.42 204.17 148.83 215.00
3 NVDA 182.49 104.29 175.36 226.59
4 NVDA 127.31 182.31 228.92 173.52
5 MSFT 217.79 147.98 120.40 239.97
6 MSFT 108.66 222.83 177.20 172.62
7 MSFT 138.16 116.36 241.62 231.15
8 MSFT 160.53 234.88 154.93 127.49
9 MSFT 168.22 127.77 224.75 207.59
10 AAPL 119.95 106.36 150.28 195.93
11 AAPL 117.71 142.54 210.08 116.37
12 AAPL 147.07 204.46 223.98 104.91
13 AAPL 135.71 211.83 210.11 102.34
14 AAPL 216.45 136.08 130.27 236.48
Share
Improve this question
asked Feb 17 at 19:35
ChrisChris
1,7004 gold badges19 silver badges29 bronze badges
3
|
5 Answers
Reset to default 3This is a simple transformation to do if you want to get it into a format the pd.DataFrame
constructor understands (a list of dicts):
df = pd.DataFrame(
[
{"ticker":k, **v}
for k, vs in data.items()
for v in vs
]
)
This will require auxiliary memory though.
You can read this with json_normalize
if you make the input a list of records:
df = pd.json_normalize([{'tick': k, 'data': v} for k, v in data.items()],
'data', meta='tick')
This should be relatively lightweight since the lists will be shared in memory with the original ones of data
.
Output:
open high low close tick
0 202.53 159.85 192.78 159.08 NVDA
1 161.14 165.17 189.66 155.31 NVDA
2 216.04 194.22 127.27 114.98 NVDA
3 204.64 137.89 103.44 111.93 NVDA
4 245.47 131.42 138.11 177.44 NVDA
5 197.37 140.20 190.76 180.82 MSFT
6 213.40 237.43 118.40 238.46 MSFT
7 127.91 192.21 186.09 221.07 MSFT
8 216.28 249.58 162.59 111.86 MSFT
9 100.44 149.07 223.15 185.34 MSFT
10 138.62 215.26 107.22 110.75 AAPL
11 188.77 104.89 193.78 183.34 AAPL
12 151.65 128.45 239.33 249.28 AAPL
13 151.82 142.17 241.76 134.61 AAPL
14 239.02 180.75 158.85 184.81 AAPL
Another option, chain
and read the data, insert
the tickers afterwards:
from itertools import chain
import numpy as np
df = pd.DataFrame(chain.from_iterable(data.values()))
df.insert(0, 'tick', np.repeat(list(data), [len(l) for l in data.values()]))
Output:
tick open high low close
0 NVDA 202.53 159.85 192.78 159.08
1 NVDA 161.14 165.17 189.66 155.31
2 NVDA 216.04 194.22 127.27 114.98
3 NVDA 204.64 137.89 103.44 111.93
4 NVDA 245.47 131.42 138.11 177.44
5 MSFT 197.37 140.20 190.76 180.82
6 MSFT 213.40 237.43 118.40 238.46
7 MSFT 127.91 192.21 186.09 221.07
8 MSFT 216.28 249.58 162.59 111.86
9 MSFT 100.44 149.07 223.15 185.34
10 AAPL 138.62 215.26 107.22 110.75
11 AAPL 188.77 104.89 193.78 183.34
12 AAPL 151.65 128.45 239.33 249.28
13 AAPL 151.82 142.17 241.76 134.61
14 AAPL 239.02 180.75 158.85 184.81
Another possible solution, which is also based on json_normalize
:
out = pd.concat([pd.json_normalize(d[x]).assign(ticker = x) for x in d])
If the ticker
column really needs to be the first one, please use:
out[np.roll(out.columns, 1)]
Output:
ticker open high low close
0 NVDA 144.75 144.21 174.33 210.47
1 NVDA 123.97 128.50 110.25 154.09
2 NVDA 118.19 134.81 104.37 149.72
3 NVDA 225.35 126.81 104.77 209.46
4 NVDA 247.20 243.25 220.44 186.01
0 MSFT 175.78 213.98 229.75 206.59
1 MSFT 142.98 168.42 188.33 232.52
2 MSFT 184.14 163.42 194.81 153.03
3 MSFT 199.54 130.26 101.05 102.10
4 MSFT 243.91 119.21 190.20 223.31
0 AAPL 202.06 162.54 212.30 226.78
1 AAPL 191.17 153.49 135.13 151.83
2 AAPL 187.15 149.75 123.28 247.32
3 AAPL 194.29 175.34 244.14 207.45
4 AAPL 228.90 133.26 100.59 129.35
Here is one option with pd.concat
+ pd.join
pd.concat(
map(lambda x: pd.DataFrame({'tick':[x[0]]*len(x[1])})
.join(pd.DataFrame(x[1])),
data.items())).reset_index(drop = True)
or as suggested by @juanpa.arrivillaga in the comment to avoid np.join
(df := pd.concat(
[pd.DataFrame(vs).assign(tick=k) for k, vs in data.items()], ignore_index=True
))[np.roll(df.columns,1)]
which gives
tick open high low close
0 NVDA 144.75 144.21 174.33 210.47
1 NVDA 123.97 128.50 110.25 154.09
2 NVDA 118.19 134.81 104.37 149.72
3 NVDA 225.35 126.81 104.77 209.46
4 NVDA 247.20 243.25 220.44 186.01
5 MSFT 175.78 213.98 229.75 206.59
6 MSFT 142.98 168.42 188.33 232.52
7 MSFT 184.14 163.42 194.81 153.03
8 MSFT 199.54 130.26 101.05 102.10
9 MSFT 243.91 119.21 190.20 223.31
10 AAPL 202.06 162.54 212.30 226.78
11 AAPL 191.17 153.49 135.13 151.83
12 AAPL 187.15 149.75 123.28 247.32
13 AAPL 194.29 175.34 244.14 207.45
14 AAPL 228.90 133.26 100.59 129.35
One way to to it
import pandas as pd
data = {'NVDA': [{'open': 144.75, 'high': 144.21, 'low': 174.33, 'close': 210.47},
{'open': 123.97, 'high': 128.5, 'low': 110.25, 'close': 154.09},
{'open': 118.19, 'high': 134.81, 'low': 104.37, 'close': 149.72},
{'open': 225.35, 'high': 126.81, 'low': 104.77, 'close': 209.46},
{'open': 247.2, 'high': 243.25, 'low': 220.44, 'close': 186.01}],
'MSFT': [{'open': 175.78, 'high': 213.98, 'low': 229.75, 'close': 206.59},
{'open': 142.98, 'high': 168.42, 'low': 188.33, 'close': 232.52},
{'open': 184.14, 'high': 163.42, 'low': 194.81, 'close': 153.03},
{'open': 199.54, 'high': 130.26, 'low': 101.05, 'close': 102.1},
{'open': 243.91, 'high': 119.21, 'low': 190.2, 'close': 223.31}],
'AAPL': [{'open': 202.06, 'high': 162.54, 'low': 212.3, 'close': 226.78},
{'open': 191.17, 'high': 153.49, 'low': 135.13, 'close': 151.83},
{'open': 187.15, 'high': 149.75, 'low': 123.28, 'close': 247.32},
{'open': 194.29, 'high': 175.34, 'low': 244.14, 'close': 207.45},
{'open': 228.9, 'high': 133.26, 'low': 100.59, 'close': 129.35}]}
dfs = []
for k,v in data.items():
for d in v:
d['tick'] = k
dfs.append(d)
df = pd.DataFrame(dfs, columns=['tick', 'open', 'high', 'low', 'close'])
print(df)
Result
tick open high low close
0 NVDA 144.75 144.21 174.33 210.47
1 NVDA 123.97 128.50 110.25 154.09
2 NVDA 118.19 134.81 104.37 149.72
3 NVDA 225.35 126.81 104.77 209.46
4 NVDA 247.20 243.25 220.44 186.01
5 MSFT 175.78 213.98 229.75 206.59
6 MSFT 142.98 168.42 188.33 232.52
7 MSFT 184.14 163.42 194.81 153.03
8 MSFT 199.54 130.26 101.05 102.10
9 MSFT 243.91 119.21 190.20 223.31
10 AAPL 202.06 162.54 212.30 226.78
11 AAPL 191.17 153.49 135.13 151.83
12 AAPL 187.15 149.75 123.28 247.32
13 AAPL 194.29 175.34 244.14 207.45
14 AAPL 228.90 133.26 100.59 129.35
pd.json_normalize()
? – Barmar Commented Feb 17 at 19:48pd.DataFrame()
gives) – Chris Commented Feb 17 at 19:55json_normalize
. See below. – mozway Commented Feb 18 at 2:41