最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

How does array indexing work differently between 1D and 2D arrays in C++? - Stack Overflow

programmeradmin2浏览0评论

I'm confused about how array indexing actually works in C++. I understand that when we use the expression x[n] (where x is an address and n is an integer), the compiler treats [] as an operator that calculates an offset from the base address.

I was taught that the compiler considers [] as an operator which moves the address by the number of bytes the data type in that address takes multiplied by the number in [] (here y).

For a 1D array, I understand that arr[i] is interpreted as to *(arr + i), which moves the address by i * sizeof(element_type) bytes. However, I'm confused about how this works with 2D arrays. Consider this example:

char country[][20] = {"U.S.A", "CHINA", "RUSSIA"};

If I access country[2], I get "RUSSIA" (the entire string), not the character 'S' as I might expect if the bracket operator simply moved 2 bytes from the start address.

But, if this is true, then why is it that when a 2D array is initialized as above, country[2] will be the sequence of characters "RUSSIA" and not the character 'S' (since the name of the array country is the address of the first term which is "U" and country[2] should mean that which comes 2 bytes after address of "U")?

I'm confused about how array indexing actually works in C++. I understand that when we use the expression x[n] (where x is an address and n is an integer), the compiler treats [] as an operator that calculates an offset from the base address.

I was taught that the compiler considers [] as an operator which moves the address by the number of bytes the data type in that address takes multiplied by the number in [] (here y).

For a 1D array, I understand that arr[i] is interpreted as to *(arr + i), which moves the address by i * sizeof(element_type) bytes. However, I'm confused about how this works with 2D arrays. Consider this example:

char country[][20] = {"U.S.A", "CHINA", "RUSSIA"};

If I access country[2], I get "RUSSIA" (the entire string), not the character 'S' as I might expect if the bracket operator simply moved 2 bytes from the start address.

But, if this is true, then why is it that when a 2D array is initialized as above, country[2] will be the sequence of characters "RUSSIA" and not the character 'S' (since the name of the array country is the address of the first term which is "U" and country[2] should mean that which comes 2 bytes after address of "U")?

Share Improve this question edited Apr 1 at 18:53 Remy Lebeau 600k36 gold badges507 silver badges851 bronze badges asked Apr 1 at 13:27 m112120m112120 233 bronze badges New contributor m112120 is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 5
  • 7 country[2] doesn't mean 2 bytes after the initial address. It means 2 elements after it. In this case each element is char[20]. – interjay Commented Apr 1 at 13:30
  • country[2] (*(country + 2)) is the third element of country, which is "RUSSIA". country[2][2] is 'S'. (Stop thinking in terms of pointers and addresses and offsets and just count.) – molbdnilo Commented Apr 1 at 17:51
  • 1 You are thinking of country and using country as a 2D array. There's nothing wrong with that, but as far as C or C++ is concerned, it's a 1D array of 1D arrays of 20 char each. country[2] is the third 1D-array-of-20-chars. You should un-ask the question because there are no 2D arrays in C or in C++. – Ohm's Lawman Commented Apr 1 at 18:01
  • country[2][2] will give you 'S' Please see: godbolt./z/7ff1Y4eWT – greg spears Commented Apr 1 at 18:07
  • I just realized that the 'S' you refer to is probably the one in "U.S.A", not the one in "RUSSIA". That S is country[0][2] - the third element of the first element in country (in pointer notation, *(*(country + 0) + 2)). (Note that the type of country is actually char[3][20], and the first dimension is determined by the number of initializers.) – molbdnilo Commented Apr 2 at 12:26
Add a comment  | 

5 Answers 5

Reset to default 3

How does array indexing work differently between 1D and 2D arrays in C++?

It does not work at all differently. In a sense, that's because C++ (and C) does not have true 2D arrays. What is conventionally called a 2D array in C++ and C is a (1D) array whose elements are themselves (1D) arrays. That's why the indexing operator takes only two operands, regardless of whether the array is (as we consider it) 1D, 2D, 3D, or higher dimensional.

In particular, this ...

char country[][20] = {"U.S.A", "CHINA", "RUSSIA"};

... declares country as an array of 3 (as determined from the initializer) arrays of 20 char each. That is, each of the expressions country[0], country[1], and country[2] designates an array of 20 char. The first is initialized from "U.S.A", the second from "CHINA", and the third from "RUSSIA".

For a 1D array, I understand that arr[i] is interpreted as to *(arr + i), which moves the address by i * sizeof(element_type) bytes.

I think it's more helpful, and it's certainly more robust, to pin the semantics of both addresses and indexing to the array elements than it is to try to break it down in terms of bytes. But if you do break it down to bytes then your characterization is correct. And that's consistent with what you observe, because the relevant element type for indexing (once) into country is char[20].

If I access country[2], I get "RUSSIA" (the entire string),

Yes, you do, more or less. Because the elements of country, which are the units measured by indexes into that array, are arrays of type char[20]. Not individual chars.

But do bear in mind, however, that "C string" is a characterization of the contents of an array, not part of such an array's data type. The expression country[2] identifies the whole array, which is more than "the entire string". The six characters of "RUSSIA" are what is emitted if, say, you feed country[2] to std::cout, but that's a formatted representation, which you should take care to distinguish from the thing itself.

not the character 'S' as I might expect if the bracket operator simply moved 2 bytes from the start address.

Yes. And if by this point I have not driven home why the single character 'S' is the wrong expectation, then I'm not sure what else to say.

If I access country[2], I get "RUSSIA" (the entire string), not the character 'S' as I might expect if the bracket operator simply moved 2 bytes from the start address.

This is half right, but likely not for the reason you think.

You did not show us how you “access country[2]” or explain what you mean by that. In C, and I think in C++, there is no way to access an array. That is, you can neither set the entire value of an array in an assignment nor get the entire value of an array in an expression. You can only access individual elements of an array.

Perhaps you printed the contents of the array with printf("%s", country[2]); or std::cout << country[2];. However, neither one of those directly accesses the array. What they do is pass a pointer to the first element of the array to printf or to the insertion routine. That routine then uses the pointer to access the individual elements of the array.

With either of the above, say printf("%s", country[2]);, the way country[2] is evaluated is:

  • country[2] is defined to be equivalent to *(country + 2).
  • In C, when an array is used in an expression other than as the operand of sizeof, the operand of a typeof operator, the operand of unary &, or a string literal initializing an array, it is automatically converted to a pointer to its first element. C++ has similar rules. So the expression becomes *(p + 2), where p is the address of the first element, &country[0].
  • p + 2 adjusts the pointer by a distance of two elements of the pointed-to type, which is char [20]. So the result of p + 2 points to country[2].
  • Then * dereferences this pointer, yielding country[2], which is an array of 20 bytes.

At this point, you are sort of correct, using country[2] has gotten you the char [20] array that contains “RUSSIA” (but it also contains null bytes after that). However, the evaluation continues:

  • Since country[2] is an array, array-to-pointer conversion is performed again, yielding a pointer to its first element, effectively &country[2][0].

It is this pointer that is passed to printf or the insertion routine. That routine examines the bytes at that location and prints them, until it finds a null byte. The retrieval of all the bytes in the string is done by the called routine; it is not part of the expression country[2].

See my short tutorial here. Your situation is no different except that you're working with "char" instead of "int" so simply replace the int a[5][10] in my tutorial with the following to match your example:

char a[][20] = {{ 'U', '.', 'S', '.', 'A', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0'},
                { 'C', 'H', 'I', 'N', 'A', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0'},
                { 'R', 'U', 'S', 'S', 'I', 'A',  '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0'}};

"a" above (so-named to match my tutorial) is identical to your "country" array (its type is char[3][20]) except it uses more verbose syntax so you can see each full element in the array opposed to initializing it with string literals which only confuses the situation (the above is easier to understand for learning purposes). Your "country" array is just a condensed syntax for "a" above. The trailing '\0' I've explicitly added above are merely what the compiler adds to "country" implicitly, in order to make each element 20 chars long (the size of each element in both "country" and identical equivalent "a"). Note that the single apostrophes above are jarring to read so you may want to remove them for legibility while learning this (as I've done in the examples below but in actual code they're required of course).

Now, as pointed out by John Bollinger in his own response (and my tutorial), the confusion lies in not understanding that "a" (or "country") is just a 1D array no different than any other (all C-style arrays in C++ and C of course are 1D). Instead of each element in the array being something trivial like an "int" however, each element is actually another array. The syntax of the two square bracket pairs confuses people however into thinking it's a 2D array. It effectively behaves like one because each element is actually another array which my tutorial explains but you shouldn't think of it this way (as 2D). Always think of it as 1D by recognizing that each element in a C-style array is simply a blob, whether an "int" blob if you have an array of ints (each element of the array stores an "int"), or an array blob if you have an array of arrays (each element of the array stores an array, a "char[20]" as shown in a moment). Sounds confusing but it's actually quite simple (with some practice). When you index into "a", such as "a[0]", you're effectively getting back the following (not real syntax but how you should think about it, and I've removed all extraneous punctuation for legibility):

a[0] = U.S.A000000000000000;
a[1] = CHINA000000000000000;
a[2] = RUSSIA00000000000000;

What's the type of each one of those elements? It's this:

char[20]

a[0] for instance is the blob of type "char[20]" that's filled in with the chars "U.S.A000000000000000", a[1] is the blob of type "char[20]" that's filled in with the chars "CHINA000000000000000" and so on. Each blob's type is "char[20]", an array of 20 characters, but it's no different than if each blob's type were a non-array type like an "int". In all cases you just index into the array such as "a[0]" and it returns whatever's stored there, whether a "char[20]" blob in this case or an "int" blob as in my tutorial. Just treat the returned blob according to its type. Since it's returning a "char[20]" blob in this case, this:

a[1]

Is effectively returning this (again, not real syntax but how you should think about it):

char b[20] = CHINA000000000000000;

You can therefore do this:

// Returns 'N'
char c = b[3];

Which obviously returns 'N'. Therefore, when you do something like this:

char d = a[1][3];

It just resolves to "c" above, because a[1] returns the blob at element 1 which is "b" above (you're indexing into "a" here as the 1D array it is - don't be confused by the 2nd square brackets yet), and then you're applying the 2nd square brackets to "b" (i.e., what a[1] just returned), so you get back "b[3]" which is 'N' (i.e., the same as "c" above).

The key to handling so-called 2D arrays (or even so-called higher dimension arrays) is to simply treat it as the 1D array it really is, where the type of each blob in that 1D array is simply another array like "b" above.

country[2] is equal to *(country + 2), but that does not mean that it just offsets the pointer by two bytes.
!!DO NOT USE THIS IT IS JUST TO MAKE IT MORE CLEAR WHAT IS REALLY HAPPENING: Instead you can think of it as *(country + (sizeof(char[20]) * 2))

The sizeof(char[20]) is here because it is the size of one element of the array and it gets multiplied by the index number in this case 2. But please do not include the last example in your code since it is done automatically for you and you would not access the element you want. If you want to learn more about this search up pointer arithmetic.

country[2] is offset 2 * 20 bytes = 40 bytes from the starting address of country. It char* pointer to beginning of sequence of chars ending automatically '\0'.
Character 'S' has address country[2][3]

memory layout:
U.S.A\0\0\0\0\0\0\0\0\0\0\0\0\0\0CHINA\0\0\0\0\0\0\0\0\0\0\0\0\0\0RUSSIA\0\0\0\0\0\0\0\0\0\0\0\0\0\0

<del>Correction, except string termination '\0', all others '\0'-bytes didn't initialize - may be anything else: memory garbage.
U.S.A\0[memory garbage]CHINA\0[memory garbage]RUSSIA\0[memory garbage]</del>

发布评论

评论列表(0)

  1. 暂无评论