最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Reduce the size of a large data set by samplinginterpolation to improve chart performance - Stack Overflow

programmeradmin4浏览0评论

I have a large set (>2000) of time series data that I'd like to display using d3 in the browser. D3 is working great for displaying a subset of the data (~100 points) to the user, but I also want a "context" view (like this) to show the entire data set and allow users to select as subregion to view in detail.

However, performance is abysmal when trying to display that many points in d3. I feel like a good solution would be to select a sample of the data and then use some kind of interpolation (spline, polynomial, etc., this is the part I know how to do) to draw a curve that is reasonably similar to the actual data.

However, it's not clear to me how I ought to go about selecting the subset. The data (shown below) has rather flat regions where fewer samples would be needed for a decent interpolation, and other regions where the absolute derivative is quite high, where more frequent sampling is needed.

To further plicate matters, the data has gaps (where the sensor generating it was failing or out of range), and I'd like to keep these gaps in the chart rather than interpolating through them. Detection of the gaps is fairly simple though, and simply clipping them out after drawing the entire data set with the interpolation seems like a reasonable solution.

I'm doing this in JavaScript, but a solution in any language or a mathematical answer to the problem would do.

I have a large set (>2000) of time series data that I'd like to display using d3 in the browser. D3 is working great for displaying a subset of the data (~100 points) to the user, but I also want a "context" view (like this) to show the entire data set and allow users to select as subregion to view in detail.

However, performance is abysmal when trying to display that many points in d3. I feel like a good solution would be to select a sample of the data and then use some kind of interpolation (spline, polynomial, etc., this is the part I know how to do) to draw a curve that is reasonably similar to the actual data.

However, it's not clear to me how I ought to go about selecting the subset. The data (shown below) has rather flat regions where fewer samples would be needed for a decent interpolation, and other regions where the absolute derivative is quite high, where more frequent sampling is needed.

To further plicate matters, the data has gaps (where the sensor generating it was failing or out of range), and I'd like to keep these gaps in the chart rather than interpolating through them. Detection of the gaps is fairly simple though, and simply clipping them out after drawing the entire data set with the interpolation seems like a reasonable solution.

I'm doing this in JavaScript, but a solution in any language or a mathematical answer to the problem would do.

Share Improve this question asked Jan 15, 2015 at 19:32 jjmjjm 6,1982 gold badges26 silver badges27 bronze badges 8
  • My guess is that the performance hit is caused mostly by the browser trying to render the giant SVG path — rather than javascript's generating of that path's "code". So while selectively removing points is one way to deal with this issue, another option that might do it is to render that path into an html canvas instead of SVG. D3 doesn't have the canvas equivalent of d3.svg.line() generator, but it can be simply drawn with 2000 calls to the canvas API's lineTo() method. I think it's worth a try. – meetamit Commented Jan 15, 2015 at 20:00
  • Yeah, I was thinking that sampling the data would be simpler than rewriting all of the code to use a canvas. – jjm Commented Jan 15, 2015 at 20:06
  • Yeah, maybe so. I would have thought that simply plotting every nth data point — without trying to get fancy about which points to drop out — is a fine way to go about it too, despite the gaps and flat parts as you point out. If you consider that you're plotting 10,000 points into a 1000 pixel strip, then plotting every 10th point still means roughly 1 point per pixel, so the lower density should be mostly imperceptible. And, you could still add one other provision that avoids skipping points that are gaps. Also, you can use a fatter line to essentially blur any missing peaks or valleys. – meetamit Commented Jan 15, 2015 at 20:20
  • Are you using a single path or are you creating each point as a separate SVG element? I wouldn't expect this many data points to be much of a problem if represented as a single path. – Ethan Jewett Commented Jan 15, 2015 at 20:25
  • @EthanJewett good point, that's more or less what the example I linked is doing. – jjm Commented Jan 15, 2015 at 21:09
 |  Show 3 more ments

3 Answers 3

Reset to default 7

You could use the d3fc-sample module, which provides a number of different algorithms for sampling data. Here's what the API looks like:

// Create the sampler
var sampler = fc_sample.largestTriangleThreeBucket();

// Configure the x / y value accessors
sampler.x(function (d) { return d.x; })
    .y(function (d) { return d.y; });

// Configure the size of the buckets used to downsample the data.
sampler.bucketSize(10);

// Run the sampler
var sampledData = sampler(data);

You can see an example of it running on the website:

https://d3fc.io/examples/sample/

The largest-triangle three-buckets algorithm works quite well on data that is 'patchy'. It doesn't vary the bucket size, but does ensure that peaks / troughs are included, which results in a good representation of the sampled data.

I know this doesn't answer your question entirely, but this library might help you to simplify your line during rendering. Not sure if they handle data gaps though.

http://mourner.github.io/simplify-js/

My advice is to average (not subsample) over longer or shorter time intervals and plot those average values as horizontal bars. I think that's very prehensible to the user -- if you try something fancier, you might give up the ability to explain exactly what's going on. I'm assuming you can let the user choose to zoom in or out so as to show more or less detail.

You might be able to get the database engine to pute averages over intervals for you, so that's a potential speed-up too.

As to the time intervals to pick, you could try either (1) fixed intervals such as 1 second, 15 seconds, 1 minute, 15 minutes, hours, days, or whatever; that might be easier for the user to understand, or (2) choose the interval to make a fixed number of units across the whole time range, e.g. if you decide to display 7 hours of data in 100 units, then each unit = 252 seconds.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论