I am trying to build a template in excel that summarizes a dataset according to test, sample type, and analyzer. Sorry - I've never posted a question about excel before, so please tell me if more info needed.
I need to calculate the Mean, Standard Deviation, and the Measurement Uncertainty for each test, separated by sample type and analyzer (MU = 1.96 x SD). My actual dataset has a few hundred 'Tests' two or three 'Types' for each, and two or four 'Analyzers'. There should be dozens of data points for each 'Result'. I will also make a few additional calcs, but keeping it simple for clarity.
Here is example data:
Test | Type | Name | Result |
---|---|---|---|
pH | Water | Analyzer1 | 6.1 |
pH | Water | Analyzer2 | 6.5 |
pO2 | Water | Analyzer1 | 55 |
pO2 | Water | Analyzer2 | 64 |
pH | Water | Analyzer1 | 6.3 |
pH | Water | Analyzer2 | 6.8 |
pO2 | Water | Analyzer1 | 50 |
pO2 | Water | Analyzer2 | 68 |
pH | Buffer | Analyzer1 | 5.9 |
pH | Buffer | Analyzer2 | 6.4 |
pO2 | Buffer | Analyzer1 | 48 |
pO2 | Buffer | Analyzer2 | 71 |
pH | Buffer | Analyzer1 | 6.4 |
pH | Buffer | Analyzer2 | 6.9 |
pO2 | Buffer | Analyzer1 | 51 |
pO2 | Buffer | Analyzer2 | 64 |
I am trying to build a template in excel that summarizes a dataset according to test, sample type, and analyzer. Sorry - I've never posted a question about excel before, so please tell me if more info needed.
I need to calculate the Mean, Standard Deviation, and the Measurement Uncertainty for each test, separated by sample type and analyzer (MU = 1.96 x SD). My actual dataset has a few hundred 'Tests' two or three 'Types' for each, and two or four 'Analyzers'. There should be dozens of data points for each 'Result'. I will also make a few additional calcs, but keeping it simple for clarity.
Here is example data:
Test | Type | Name | Result |
---|---|---|---|
pH | Water | Analyzer1 | 6.1 |
pH | Water | Analyzer2 | 6.5 |
pO2 | Water | Analyzer1 | 55 |
pO2 | Water | Analyzer2 | 64 |
pH | Water | Analyzer1 | 6.3 |
pH | Water | Analyzer2 | 6.8 |
pO2 | Water | Analyzer1 | 50 |
pO2 | Water | Analyzer2 | 68 |
pH | Buffer | Analyzer1 | 5.9 |
pH | Buffer | Analyzer2 | 6.4 |
pO2 | Buffer | Analyzer1 | 48 |
pO2 | Buffer | Analyzer2 | 71 |
pH | Buffer | Analyzer1 | 6.4 |
pH | Buffer | Analyzer2 | 6.9 |
pO2 | Buffer | Analyzer1 | 51 |
pO2 | Buffer | Analyzer2 | 64 |
The result I am after looks like this:
Test | Type | Name | Mean | SD | MU |
---|---|---|---|---|---|
pH | Water | Analyzer1 | 6.2 | 0.35 | 0.69 |
pH | Water | Analyzer2 | 6.7 | 0.35 | 0.69 |
pH | Buffer | Analyzer1 | 6.2 | 0.14 | 0.28 |
pH | Buffer | Analyzer2 | 6.7 | 0.21 | 0.42 |
pO2 | Water | Analyzer1 | 52.5 | 3.54 | 6.93 |
pO2 | Water | Analyzer2 | 66.0 | 2.83 | 5.54 |
pO2 | Buffer | Analyzer1 | 49.5 | 2.12 | 4.16 |
pO2 | Buffer | Analyzer2 | 67.5 | 4.95 | 9.70 |
I am trying to automate this as much as possible, but don't want to use VBA. Unless I am missing something, I can't use a PivotTable as I have to use the Mean/SD to calculate the MU. Thus far, I have tried creating a new table using:
=UNIQUE(FILTER(B:B,A:A="pH"))
which generates an array containing Buffer and Water, but I can't include AND() in the FILTER() function to further parse the data. I have tried a bit of XLOOKUP() but that doesn't seem to work either.
Very interested to hear an approach!
***Edits:
I am using Microsoft® Excel® for Microsoft 365 MSO (Version 2408 Build 16.0.17928.20336) 32-bit
Previous responses have employed GROUPBY()
or PIVOTBY()
but these functions do not appear present in this version of Excel.
3 Answers
Reset to default 4Here is one way of accomplishing the desired output, using MAKEARRAY()
:
=LET(
_Data, A1:D17,
_Header, HSTACK(TAKE(_Data,1,3),"Mean","SD","MU"),
_Body, DROP(_Data,1),
_Catg, TAKE(_Body,,3),
_Uniq, UNIQUE(_Catg),
VSTACK(_Header, SORT(HSTACK(_Uniq, ROUND(MAKEARRAY(ROWS(_Uniq), 3, LAMBDA(r,c,
INDEX(LET(a, CHOOSECOLS(_Body,-1),
b, BYROW(_Catg, LAMBDA(x, AND(x=CHOOSEROWS(_Uniq, r)))),
d, FILTER(a,b), HSTACK(AVERAGE(d), STDEV(d), 1.96*STDEV(d))),c))),2)),{1,2})))
Or, Using GROUPBY()
=DROP(GROUPBY(A1:C17,D1:D17,HSTACK(AVERAGE,STDEV,LAMBDA(x,1.96*STDEV(x))),1,0),1)
In the absence of GROUPBY
, another option is to use the concept of "thunks" (parameter-less lambda functions) to minimize redundant calculations and improve efficiency. For example:
=LET(
fields, A2:C17,
values, D2:D17,
key_id, BYROW(fields, LAMBDA(x, TEXTJOIN("|", 0, x))),
keys, UNIQUE(SORT(HSTACK(key_id, fields))),
vals, MAP(TAKE(keys,, 1), LAMBDA(id, LET(x, FILTER(values, key_id = id), LAMBDA(x)))),
func, HSTACK(LAMBDA(x, AVERAGE(x)), LAMBDA(x, STDEV.S(x))),
result, MAP(IFNA(func, vals), IFNA(vals, func), LAMBDA(fnλ,val, fnλ(val()))),
VSTACK(HSTACK(A1:C1, "Mean", "SD", "MU"), HSTACK(DROP(keys,, 1), result, TAKE(result,, -1) * 1.96))
)
The vals variable returns a vertical vector of "thunks", LAMBDA(x)
, each one containing the filtered results for each unique set of keys. The func variable is defined as a horizontal vector of functions to be applied to each set of results. The elements of both vectors are TYPE 128 (compound data). IFNA
is used to broadcast the func vector across each row of the vals vector, as well as the vals vector across each column of the func vector, then MAP
loops through both arrays together, applying each function to each set of values. By doing it this way, the FILTER
function is only evaluated ONCE per output row, the results of which are broadcast (repeated) across multiple columns and recalled as needed simply by providing the opening and closing brackets, e.g. val()
.
Alternatively, each function can also be mapped individually, and the results stacked together manually:
=LET(
fields, A2:C17,
values, D2:D17,
key_id, BYROW(fields, LAMBDA(x, TEXTJOIN("|", 0, x))),
keys, UNIQUE(SORT(HSTACK(key_id, fields))),
vals, MAP(TAKE(keys,, 1), LAMBDA(id, LET(x, FILTER(values, key_id = id), LAMBDA(x)))),
mean, MAP(vals, LAMBDA(x, AVERAGE(x()))),
sdev, MAP(vals, LAMBDA(x, STDEV.S(x()))),
VSTACK(HSTACK(A1:C1, "Mean", "SD", "MU"), HSTACK(DROP(keys,, 1), mean, sdev, sdev * 1.96))
)
Or REDUCE
can be used to loop through the func vector and HSTACK
the results automatically:
=LET(
fields, A2:C17,
values, D2:D17,
key_id, BYROW(fields, LAMBDA(x, TEXTJOIN("|", 0, x))),
keys, UNIQUE(SORT(HSTACK(key_id, fields))),
vals, MAP(TAKE(keys,, 1), LAMBDA(id, LET(x, FILTER(values, key_id = id), LAMBDA(x)))),
func, HSTACK(LAMBDA(x, AVERAGE(x())), LAMBDA(x, STDEV.S(x())), LAMBDA(x, STDEV.S(x()) * 1.96)),
result, REDUCE(DROP(keys,, 1), func, LAMBDA(a,fnλ, HSTACK(a, MAP(vals, fnλ)))),
VSTACK(HSTACK(A1:C1, "Mean", "SD", "MU"), result)
)
Sample Results:
A minimalist solution, using the fact that
SD(x) * k = SD(x * k)
Add an extra column to the data
Test | Type | Name | Result | ResultX1.96 |
---|---|---|---|---|
pH | Water | Analyzer1 | 6.1 | 11.956 |
pH | Water | Analyzer2 | 6.5 | 12.74 |
pO2 | Water | Analyzer1 | 55 | 107.8 |
pO2 | Water | Analyzer2 | 64 | 125.44 |
pH | Water | Analyzer1 | 6.3 | 12.348 |
pH | Water | Analyzer2 | 6.8 | 13.328 |
pO2 | Water | Analyzer1 | 50 | 98 |
pO2 | Water | Analyzer2 | 68 | 133.28 |
pH | Buffer | Analyzer1 | 5.9 | 11.564 |
pH | Buffer | Analyzer2 | 6.4 | 12.544 |
pO2 | Buffer | Analyzer1 | 48 | 94.08 |
pO2 | Buffer | Analyzer2 | 71 | 139.16 |
pH | Buffer | Analyzer1 | 6.4 | 12.544 |
pH | Buffer | Analyzer2 | 6.9 | 13.524 |
pO2 | Buffer | Analyzer1 | 51 | 99.96 |
pO2 | Buffer | Analyzer2 | 64 | 125.44 |
and use a pivot table with tabular layout and no subtotals
Test | Type | Name | Average of Result | StdDev of Result | StdDev of ResultX1.96 |
---|---|---|---|---|---|
pH | Buffer | Analyzer1 | 6.15 | 0.353553391 | 0.692964646 |
pH | Buffer | Analyzer2 | 6.65 | 0.353553391 | 0.692964646 |
pH | Water | Analyzer1 | 6.2 | 0.141421356 | 0.277185858 |
pH | Water | Analyzer2 | 6.65 | 0.212132034 | 0.415778787 |
pO2 | Buffer | Analyzer1 | 49.5 | 2.121320344 | 4.157787873 |
pO2 | Buffer | Analyzer2 | 67.5 | 4.949747468 | 9.701505038 |
pO2 | Water | Analyzer1 | 52.5 | 3.535533906 | 6.929646456 |
pO2 | Water | Analyzer2 | 66 | 2.828427125 | 5.543717165 |
Grand Total | 32.64375 | 27.7691668 | 54.42756694 |
GROUPBY
also does not appear to be present... – Dorton Commented Mar 13 at 21:43LET()
function solutions. Unfortunately I have a few additional variables I need to calculate as well which means I can't only do that. – Dorton Commented Mar 14 at 15:40