最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Algorithm to remove extreme outliers in array - Stack Overflow

programmeradmin8浏览0评论

I've got an array which I use for the x-axis in a D3 graph, and it blows up because the chart size is too small for the size of the array. I had a look at data and there are extreme outliers in the data. See chart below.

The data around 0 (its not totally zero, its 0.00972 etc).

The data starts getting interesting around 70, then massive spikes about 100. the data then continues and then the same sort of thing on the other side about 200.

Can anyone help me with some algo that removes the extreme outliers? e.g. give me 95% or 90% percentiles and remove the contiguous elements (e.g. not just one element from the middle but x number of elements from the start of the array and the end of the array, where x depends on working out where best to do it based on the data? In Javascript as well please!

thanks!

ps you'll need to save the image to view it properly

I've got an array which I use for the x-axis in a D3 graph, and it blows up because the chart size is too small for the size of the array. I had a look at data and there are extreme outliers in the data. See chart below.

The data around 0 (its not totally zero, its 0.00972 etc).

The data starts getting interesting around 70, then massive spikes about 100. the data then continues and then the same sort of thing on the other side about 200.

Can anyone help me with some algo that removes the extreme outliers? e.g. give me 95% or 90% percentiles and remove the contiguous elements (e.g. not just one element from the middle but x number of elements from the start of the array and the end of the array, where x depends on working out where best to do it based on the data? In Javascript as well please!

thanks!

ps you'll need to save the image to view it properly

Share Improve this question asked Mar 26, 2014 at 14:00 JMLJML 3827 silver badges19 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 18

Assuming the data is like

var data[] = {0.00972, 70, 70, ...};

first sort

data.sort(function(a,b){return a-b});

then take off the bottom 2.5% and top 2.5%

var l = data.length;
var low = Math.round(l * 0.025);
var high = l - low;
var data2 = data.slice(low,high);

An alternative would be to only show data within 3 standard deviations of the mean. If you data is normally distributed 99.7% will fall in this range.

var sum=0;     // stores sum of elements
var sumsq = 0; // stores sum of squares
for(var i=0;i<data.length;++i) {
    sum+=data[i];
    sumsq+=data[i]*data[i];
}
var mean = sum/l; 
var varience = sumsq / l - mean*mean;
var sd = Math.sqrt(varience);
var data3 = new Array(); // uses for data which is 3 standard deviations from the mean
for(var i=0;i<data.length;++i) {
    if(data[i]> mean - 3 *sd && data[i] < mean + 3 *sd)
        data3.push(data[i]);
}

Or similar using some multiple of the Inter-quartile range

var median = data[Math.round(l/2)];
var LQ = data[Math.round(l/4)];
var UQ = data[Math.round(3*l/4)];
var IQR = UQ-LQ;
var data4 = new Array();
for(var i=0;i<data.length;++i) {
    if(data[i]> median - 2 * IQR && data[i] < mean + 2 * IQR)
        data4.push(data[i]);
}
发布评论

评论列表(0)

  1. 暂无评论