Anomalous behaviour of Excel's INDEX() function between references and literal arrays

This question relates to the Excel function INDEX(), which is documented by Microsoft here.

If I have an Excel range in, say, A1:B2:

A	B
1	2
3	4

This question relates to the Excel function INDEX(), which is documented by Microsoft here.

If I have an Excel range in, say, A1:B2:

A	B
1	2
3	4

and I use the formula

=INDEX(A1:B2, 0, 0)

then the full range is returned to the worksheet (as a dynamic array). This agrees with the documentation.

If the final argument, col_num, is missing:

=INDEX(A1:B2, 0, )

then the result is the same, again matching the documentation.

But if col_num is omitted:

=INDEX(A1:B2, 0)

then a #REF! error is returned.

However, if the first argument to INDEX() is changed from a range reference to the equivalent literal array:

=INDEX({1, 2; 3, 4}, 0)

then the full range is returned again.

(Edit for clarity: The value of the second argument, row_num, is not important, as long as it is not negative or out of range; the behaviour described above is exhibited for any valid value of row_num, including zero or 'missing'.)

The #REF! error only occurs if col_num is omitted and the input reference is two-dimensional (i.e. has multiple rows and multiple columns). If, instead, the reference has only one row and/or only one column, the behaviour reverts to returning the full range, matching the behaviour for a literal array.

I cannot find this apparent anomaly documented anywhere. Moreover, this is the only discrepancy I can find between the two cases of the first argument to INDEX() being a reference or a literal array; in all other scenarios, the behaviour seems identical between the two.

Is this discrepancy deliberate? Is there any documentation on this behaviour that I have missed?

Share Improve this question edited Jan 30 at 14:17 asked Jan 30 at 12:45 Neil T 3,2852 gold badges17 silver badges29 bronze badges

@user11222393 - I'm not sure what you mean. My question is: why does =INDEX(A1:B2, 0) return something different from =INDEX({1, 2; 3, 4}, 0). Your formula is different and doesn't exhibit the anomaly. – Neil T Commented Jan 30 at 13:12
2 @cybernetic.nomad - thank you; I should have clarified that I was talking about the form of the formula. The actual value of row_num is not important: #REF! is always returned when INDEX() is applied to a 2D reference with an omitted col_num, whereas the equivalent call on the literal array {1, 2; 3, 4} always succeeds (unless row_num is negative or out of range). This also applies if row_num has the "Missing" value, as in your example, in which case the row_num parameter takes its default value 0. (I've edited the question to clarify that point.) – Neil T Commented Jan 30 at 14:14
2 I just tested all the variations, you are right. It seems Excel doesn't treat literal arrays exactly like ranges. The why is above my pay grade, I'm afraid. – cybernetic.nomad Commented Jan 30 at 14:27
1 Yes, but logically, as soon as you use a range, you are actually using the reference form, since there is no syntactical difference between the two (unless you are using the optional area number argument) – Rory Commented Jan 30 at 15:38
2 As regards why the documentation is written the way it is, I suspect that it's just badly written, as is often the case. ;) – Rory Commented Jan 30 at 16:08

| Show 6 more comments

2 Answers 2

Sorted by: Reset to default 5

With some help from @Rory (many thanks), I think I can now explain this behaviour.

The key things to understand are:

In Excel, some function parameters can accept either reference or array arguments. The behaviour of the function may differ depending on whether a reference or an array is passed.
INDEX() is such a function. The documentation refers to the "array form" and the "reference form". The syntax is almost identical (except for the rare "reference form" case where multiple ranges are passed).
The documentation states that the array parameter of the "array form" can accept "a range of cells or an array constant". Crucially, "range of cells" here does not mean a reference to a cell range on a worksheet such as A1:B2 (this would be a reference). Rather, it means an in-memory cell range, as might be returned by a nested function call to an outer function (examples below).
Therefore, INDEX(A1:B2, ...) always calls the "reference form" of the INDEX() fucntion, never the "array form".
As pointed out by @Rory in the comments, there is a minor difference in the documentation between the two forms: in the "array form", the column_num parameter is optional for a 2D range ("2D" meaning > 1 row, > 1 column). This is not the case for the "reference form".

This explains why, in this unique case of a 2D reference with an omitted column_num, a #REF! error is returned, whereas the equivalent call on a literal array (or an in-memory cell range) succeeds because the "array form" supports this.

Examples:

=INDEX(A1:B2, 0) : fails (reference form; col_num is required for a 2D reference)

=INDEX(A1:B2, 0, ) : succeeds (reference form; the comma means that col_num is present, even though 'missing')

=INDEX({1, 2; 3, 4}, 0) : succeeds (array form; col_num is optional for a 2D array)

=INDEX(INDEX(A1:B2, 0, 0), 0) : fails (a _reference_ is passed to the outer INDEX() call)

=INDEX(SQRT(INDEX(A1:B2, 0, 0)), 0) : succeeds (an _array_ is passed to the outer INDEX() call)

This last example is the "range of cells" case: the inner INDEX() function returns a reference, which is processed by the SQRT() function, returning an array ("range of cells"), which then calls the "array form" of the outer INDEX() function.

I still do not understand why this difference is necessary; I can see no reason why the "reference form" should not behave identically to the array form in this regard. Presumably there is a reason, which I would love to hear if anyone knows it!

It's special behavior depending on the number of arguments. If the argument is a vector (1xN or Nx1), the second argument is treated as the index regardless of orientation:

So, there are two cases:

Two arguments - array and index of a vector;
Three arguments - array, row index, and column index.

UPD

The related feature is the difference of processing references and dynamic arrays by BYROW and BYCOLUMN functions which I researched earlier:

Summary

It looks like a feature we should be aware.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Anomalous behaviour of Excel's INDEX() function between references and literal arrays - Stack Overflow

2 Answers 2

与本文相关的文章

评论列表(0)