I have a chart which could contain some limited amount of data (let’s say 100 items) in form of x-y pairs and data continuously coming from some device. I want to draw the chart since the start of measurement (not a sliding chart of last 100 values) till current time, storing only these 100 items (and, possibly some extra information), but not all incoming values which could have millions input values.
In my particular case x values are time values. So, the chart could be considered as a "time strip" where new points come from the right and I want the left side to stay still.
So, I have something like:
struct vec2f {
float x, y;
};
std::vector<vec2f> data;
and want to draw a chart of 100 averaged (or min/max) values. The problem is not hard when the amount of received data is less or equal chart’s capacity. The question is what should I do when 101th pair comes if I want the chart represent the graph from the start of measurement?
So, the key part of the problem is that data is coming incrementally, I can't process it as a whole and I can't store it.
Intuitively I see that I should just “squeeze” data in the same way I would squeeze the picture in the ratio 99/100 and then add a new “column”, but what is the best approach to deal with the array of points (vectors)?
Is there well-known solution for this?
C++ is preferable (especially if something handy for these calculations in std
), but any other language would help, since I more interested in the approach.
Update
I missed one (I guess, not critical) constrain. For the first (leftmost) and last (rightmost) value I want the chart show them precisely. With this I mean, that if the data started at time 0 with value 0, I want the chart to start from [0,0] even in case other places could use averaged values. So, I don't want the averaged bulk of first values to affect the starting value. It is important to be able to see that the data started from zero. The same with the recently added value.
Update 2
Of course, any such transformation should keep proportions and immateriality; in case I have straight line (like y = x), it must stay straight after the updates.
There is no constraints on time (x values), the sampling time could vary and sometimes it could even be zero (I am free to take the last coming value in this case). The only constrain is that time can't go backward.
I have some relaxations to the requirements here:
- The chart could be slightly inaccurate in terms of "what value to you want, min, max, average, median" or aliasing caused by blending/weighting calculations.
- I am not much concerned with performance (unless it overkills), my concern is not storing too many input values since I have long-running system and after days of work it could be hundreds of millions of values while I need only 100 for drawing.
I have a chart which could contain some limited amount of data (let’s say 100 items) in form of x-y pairs and data continuously coming from some device. I want to draw the chart since the start of measurement (not a sliding chart of last 100 values) till current time, storing only these 100 items (and, possibly some extra information), but not all incoming values which could have millions input values.
In my particular case x values are time values. So, the chart could be considered as a "time strip" where new points come from the right and I want the left side to stay still.
So, I have something like:
struct vec2f {
float x, y;
};
std::vector<vec2f> data;
and want to draw a chart of 100 averaged (or min/max) values. The problem is not hard when the amount of received data is less or equal chart’s capacity. The question is what should I do when 101th pair comes if I want the chart represent the graph from the start of measurement?
So, the key part of the problem is that data is coming incrementally, I can't process it as a whole and I can't store it.
Intuitively I see that I should just “squeeze” data in the same way I would squeeze the picture in the ratio 99/100 and then add a new “column”, but what is the best approach to deal with the array of points (vectors)?
Is there well-known solution for this?
C++ is preferable (especially if something handy for these calculations in std
), but any other language would help, since I more interested in the approach.
Update
I missed one (I guess, not critical) constrain. For the first (leftmost) and last (rightmost) value I want the chart show them precisely. With this I mean, that if the data started at time 0 with value 0, I want the chart to start from [0,0] even in case other places could use averaged values. So, I don't want the averaged bulk of first values to affect the starting value. It is important to be able to see that the data started from zero. The same with the recently added value.
Update 2
Of course, any such transformation should keep proportions and immateriality; in case I have straight line (like y = x), it must stay straight after the updates.
There is no constraints on time (x values), the sampling time could vary and sometimes it could even be zero (I am free to take the last coming value in this case). The only constrain is that time can't go backward.
I have some relaxations to the requirements here:
- The chart could be slightly inaccurate in terms of "what value to you want, min, max, average, median" or aliasing caused by blending/weighting calculations.
- I am not much concerned with performance (unless it overkills), my concern is not storing too many input values since I have long-running system and after days of work it could be hundreds of millions of values while I need only 100 for drawing.
2 Answers
Reset to default 7I would suggest keeping 201 points. Each point would contain an average, and a count of elements. Preferably packed with counts looking something like this example:
8, 8, ..., 8, 4, 4, ..., 4, 1
As we add elements, we take a weighted average of them and the last value. When the last count has reached the next to last count (in the example, 4), then we merge the first two of that count (in the example, the 2 4s after an 8) to shrink the array, then add a new element with count 1.
Eventually we'll get to
8, 8, ..., 8, 8
Now start merging at the front again, so next we go to:
16, 8, 8, ..., 8, 1
No matter how much data you add, we'll never need more than 201 points.
And now for display purposes, we go as follows. First sum up the counts and divide by 100. This gives the total weight we want each display point to have. Starting at the beginning, we take weighted averages of pairs of adjacent points (note that there are conveniently 100 "holes" between our 201 points) to get the displayed values.
After some research I ended up with a simple algorithm which:
Allocates two times more elements than needed for chart and takes each input element to the array (
averager=1
).When the chart data array is filled, squeezes the data array twice averaging values; so 200 elements array becomes 100 elements array.
After each such squeeze averages two times more input elements than it was before previous squeezing (`averager*=2`).
Code speaks better, so here is the proof of concept version of the code:
#include <vector>
struct Point {
float x = 0.0f, y = 0.0f;
};
class ChartHalvingAlgorithm {
std::vector<Point> data;
std::size_t size = 0;
Point average;
std::size_t averager = 1;
std::size_t averager_counter = 0;
public:
ChartHalvingAlgorithm(std::size_t capacity)
: data(capacity) {
}
void add(const Point& pt) {
average.x += pt.x;
average.y += pt.y;
++averager_counter;
if (averager_counter < averager) {
return;
}
const float x = average.x / averager_counter;
const float y = average.y / averager_counter;
average = Point(0.0f, 0.0f);
averager_counter = 0;
if (size >= data.size()) {
for (std::size_t i = 0; i < (data.size() - 1) / 2; ++i) {
data[i].x = (data[2 * i].x + data[2 * i + 1].x) / 2;
data[i].y = (data[2 * i].y + data[2 * i + 1].y) / 2;
}
size = (size - 1) / 2;
averager *= 2;
}
if (size < data.size()) {
data[size] = Point(x, y);
++size;
}
}
};
Drawing is platform-depended and trivial, so it is out of scope here.
std::deque
– 3CxEZiVlQ Commented Mar 19 at 20:06std::deque
could help here somehow. – Damir Tenishev Commented Mar 19 at 20:50std::deque
is that once youpush_back
the 101:th element, you can remove the 1:st element, thereby always keeping the last 100 elements only. Is the sampling time set before you start? If so, you could perhaps make a histogram. For 1000 samples, sum the first 10 y-samples and store the average in the first slot etc. – Ted Lyngmo Commented Mar 19 at 21:02interval trees
orsegment trees
where you stream data to/from disk whenever it is not needed and you just maintain averages in memory, this is done by unreal engine nanite on the GPU in real-time on hundreds of million of points each frame, but in practice most visualization apps don't try that hard, 1 Billion points is 120 MB, computers have 40+ Gb/s memory bandwidth, they can draw that at 3 fps from just anstd::deque
, which is more than what a human can care about anyway – Ahmed AEK Commented Mar 19 at 21:19