Consider this example where I add some extra \0
to a string.
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv){
char str1[] = "dog";
char str2[] = "dog\0";
char str3[] = "dog\0\0";
printf("%ld %ld %ld\n", strlen(str1), strlen(str2), strlen(str3));
return 0;
}
But the strlen()
return 3 for all of them. Now:
- Why is it that the extra
\0
not cause the size to increase? - If the return value of
strlen()
is same, is their actual size in memory also same?
Consider this example where I add some extra \0
to a string.
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv){
char str1[] = "dog";
char str2[] = "dog\0";
char str3[] = "dog\0\0";
printf("%ld %ld %ld\n", strlen(str1), strlen(str2), strlen(str3));
return 0;
}
But the strlen()
return 3 for all of them. Now:
- Why is it that the extra
\0
not cause the size to increase? - If the return value of
strlen()
is same, is their actual size in memory also same?
4 Answers
Reset to default 8The declaration
char str1[] = "dog";
is equivalent to:
char str1[] = { 'd', 'o', 'g', '\0' };
The declaration
char str2[] = "dog\0";
is equivalent to:
char str2[] = { 'd', 'o', 'g', '\0', '\0' };
The declaration
char str3[] = "dog\0\0";
is equivalent to:
char str3[] = { 'd', 'o', 'g', '\0', '\0', '\0' };
In C, a string is a contiguous sequence of characters whose end is marked with a null character ('\0'
). The function strlen
will return the number of characters in the string without the null character. This value is different from the length of the array which contains the string.
- Why is it that the extra
\0
not cause the size to increase?
In all three cases, the fourth character is the null character which marks the end of the character sequence, so the length of the strings are 3
in all chases. Only the sizes of the arrays which contain the strings are different.
- If the return value of
strlen()
is same, is their actual size in memory also same?
The size in memory of the strings themselves are the same. But the sizes of the arrays which contain the strings are different. You can print the sizes of the arrays like this:
#include <stdio.h>
int main( void )
{
char str1[] = "dog";
char str2[] = "dog\0";
char str3[] = "dog\0\0";
printf( "%zu %zu %zu\n", sizeof str1, sizeof str2, sizeof str3 );
}
This program has the following output:
4 5 6
The function strlen
will return the length of the string, whereas the sizeof
operator will yield the length of the array.
Note that both strlen
and sizeof
will yield a value of type size_t
, and %zu
is the correct conversion format specification for that data type. The %ld
conversion format specification is for the data type long
, which is not being used here. By using the incorrect conversion format specification, your program is invoking undefined behavior.
The strlen
function counts characters up to the first null bytes, i.e. '\x0'
. So in each of these three cases there are 3 characters before the full null byte.
The string constants however have a length which includes all characters in the constant plus a null byte, and the arrays they initialize also have that size since their size is left blank.
So if you were to print the array sizes:
printf("%zu %zu %zu\n", sizeof str1, sizeof str2, sizeof str3);
It would print 4 5 6.
Also, as noted above, both the return value of strlen
and the result of the sizeof
operator have type size_t
, so the proper format string specifier to print them is %^zu
.
'\0'
does affect the length of a string. It determines where the string ends.
Quoting the C standard, section 7 (the italics mean that these are definitions of technical terms):
A string is a contiguous sequence of characters terminated by and including the first null character. [...] The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters, in order.
Note that the "length" of a string is not the size of the entire string. The string includes its '\0'
terminator, but the length counts only the characters preceding the terminator.
Any additional '\0'
characters following the terminator are not part of the string and do not contribute to its length.
Keep in mind that the length of a string is not the same as the size of the array object containing the string.
Basically strlen
iterate the string until he found a null-character:
strlen implementation
int strlen(const char *str) {
int i = 0;
while (str[i] != '\0') {
i++;
}
return i;
}
So this is why "dog\0\0\0" length is 3 and not 6
\0
which in the first example is placed automatically. It's the end marker, and if the string could include it, how will you know when it ends? – Weather Vane Commented Mar 16 at 23:08strlen()
returnssize_t
so use the format spec%zu
, not%ld
. – Weather Vane Commented Mar 16 at 23:10strlen()
function doesn't count the null character\0
while calculating the length. – Paul T. Commented Mar 16 at 23:11sizeof str2
. – Weather Vane Commented Mar 16 at 23:11\0
bytes; see stackoverflow/questions/19696346 – Stephen C Commented Mar 16 at 23:25