I need to model 1,000,000+ data points in JSON. I am thinking of two ways of doing this:
a) Array of objects:
[{time:123456789,value:1432423},{time:123456790,value:1432424},....]
or
b) Nested arrays
[[123456789,1432423],[123456790,1432424],....]
Naively comparing these two approaches, it feels like the latter is faster because it uses less characters but less descriptive. Is b really faster than a ? Which one would you choose and why ?
Is there a 3rd approach ?
I need to model 1,000,000+ data points in JSON. I am thinking of two ways of doing this:
a) Array of objects:
[{time:123456789,value:1432423},{time:123456790,value:1432424},....]
or
b) Nested arrays
[[123456789,1432423],[123456790,1432424],....]
Naively comparing these two approaches, it feels like the latter is faster because it uses less characters but less descriptive. Is b really faster than a ? Which one would you choose and why ?
Is there a 3rd approach ?
Share Improve this question asked May 12, 2015 at 11:02 Ali SalehiAli Salehi 6,99911 gold badges53 silver badges77 bronze badges 3- Faster in which respect? Creating the output? Parsing? Transferring? Plus, IMHO, 1M+ entries screams for some other form of representation. – wonderb0lt Commented May 12, 2015 at 11:06
- Faster on parsing at client side – Ali Salehi Commented May 12, 2015 at 11:09
- 1 Well, then test both approaches and decide based on hard-facts ( sub-usec timing, transport & processing latency, peak resources consumption, deferred garbage-collection issues ) – user3666197 Commented May 13, 2015 at 9:05
3 Answers
Reset to default 12{time:[123456789,123456790,...], value:[1432423,1432424,...]}
why?
- iterating over a primitive array is faster.
- comparable to "JSON size" with b) but you will not lose the "column" information
this npm could be of interest: https://github.com/michaelwittig/fliptable
If your time series data models some continuous function, especially over regular time intervals, there could be much more efficient representation with delta compression, even if you are still using JSON:
[
{time:10001,value:12345},
{time:10002,value:12354},
{time:10003,value:12354},
{time:10010,value:12352}
]
Can be represented as:
[[10001,1,1,7],[12345,9,,-2]]
Which is a 4 times shorter representation.
The original could be reconstructed with:
[{time:a[0][0],value:a[1][0]},{time:a[0][0] + a[0][1]||1, value: a[1][0] + a[1][1]||0 ...
To add another example (idea: 'time is a key'):
ts1 = {123456789: 1432423, 123456790: 1432424}
One could imagine even:
ts2 = {"2017-01-01": {x: 2, y: 3}, "2017-02-01": {x: 1, y: 5}}
Quite compact in notation.
When you want to get the keys, use Object.keys
:
Object.keys(ts2) // ["2017-01-01", "2017-02-01"]
You can then either get the values by iterating using these keys or use the more experimental Object.values
:
Object.values(ts2) // [{x: 2, y: 3}, {x: 1, y: 5}
In terms of speed: A quick test with 10.000.000 items in an array worked here:
obj3 = {};
for(var i=0; i < 10000000; i++) {obj3[i] = Math.random()};
console.time("values() test");
Object.values(obj3);
console.timeEnd("values() test");
console.time("keys() test");
Object.keys(obj3);
console.timeEnd("keys() test");
Results at my machine (Chrome, 3.2Ghz Xeon):
- values() test: 181.77978515625ms
- keys() test: 1230.604736328125ms