最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

c - Why does strlen ignore extra null terminators in a string, and how does it determine the length? - Stack Overflow

programmeradmin3浏览0评论

I understand that strlen in C calculates the length of a string by counting characters until it encounters the first null terminator (\0). However, if a string contains multiple null terminators, like "hello\0world\0", why does strlen still return 5 instead of counting the entire string?

Additionally, does this mean that the memory size of the string is unaffected by the extra null terminators, or is there a difference between the length returned by strlen and the actual memory size?

I understand that strlen in C calculates the length of a string by counting characters until it encounters the first null terminator (\0). However, if a string contains multiple null terminators, like "hello\0world\0", why does strlen still return 5 instead of counting the entire string?

Additionally, does this mean that the memory size of the string is unaffected by the extra null terminators, or is there a difference between the length returned by strlen and the actual memory size?

Share Improve this question edited Mar 17 at 12:23 Mike 4,3807 gold badges24 silver badges45 bronze badges asked Mar 17 at 10:36 Ronin ThomasRonin Thomas 754 bronze badges 3
  • 21 "until it encounters the first null terminator". You're answering your own question there. – robertklep Commented Mar 17 at 10:39
  • 2 The point of the nul is that it is the teminator, and anything after it is ignored. Suppose you malloc more memory than is necessary, which happens to contain a random assortment of byte values, some of which are 0. You then copy a small string into the big allocated memory. How would anybody know which of those zeros is the intended final zero? – Weather Vane Commented Mar 17 at 10:51
  • A string literal "hello\0world\0" will actually have two \0 at the end. Of course, you could implement functions that consider two consecutive \0 as the string terminator and allow single \0 in a string, but this is not how the standard functions are designed. You might get the same memory contents with multiple string copy operations using longer strings first, then shorter, so the "world" might be a remainder from a previous value "hello world" before copying "hello" or after deliberately truncating "hello world" after "hello". – Bodo Commented Mar 17 at 14:16
Add a comment  | 

3 Answers 3

Reset to default 8

strlen stops counting at first \0 because it is designed to determine the length of a null-terminated string, not the allocated memory size. In "hello\0world\0", strlen("hello\0world\0") returns 5 because it stops at the first \0, ignoring everything after it.

The memory size of the string is unaffected by strlen. The actual memory size depends on how the string was allocated, while strlen only counts up to the first null terminator. So, strlen may return a smaller length than the allocated memory size.

I understand that strlen in C calculates the length of a string by counting characters until it encounters the first null terminator (\0).

Yes, that is how strlen works:

Returns the length of the given null-terminated byte string, that is, the number of characters in a character array whose first element is pointed to by str up to and not including the first null character.

That means, that as soon as strlen encounters the first NUL character, it stops. If there are multiple NUL characters in a string, it ignores everything after the first. This applies to other function, too. That's also why printf("%s", "hello\0world\0") will only print hello, since everything after the first NUL is considered to not be part of the string.

You can verify how long a (NUL terminated) string is using strlen(str) and how much memory it occupies, which includes the NUL character, using sizeof str (note that this only works on arrays, not on pointers).

With that knowledge you can test how long your string is and how the memory layout works for yourself with a program like the following:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

void printString(const char* str, size_t begin, size_t end) {
    if (begin >= end) {
        printf("{}");
        return;
    }
    printf("{ '%c'", str[begin]);
    for (size_t i = begin + 1; i < end; ++i) {
        if (isprint(str[i])) {
            printf(", '%c' ", str[i]);
        }
        else {
            printf(", '\\x%X' ", str[i]);
        }
    }
    printf("}");
}

int main() {
    const char str[] = "hello\0world\0";
    const size_t memSize = sizeof str;
    const size_t length = strlen(str);

    printf("memory size: %2zu ", memSize);
    printString(str, 0, memSize);
    printf("\n");
    printf("strlen size: %2zu ", length);
    printString(str, 0, length);
    printf("\n");
}

Output:

memory size: 13 { 'h', 'e' , 'l' , 'l' , 'o' , '\x0' , 'w' , 'o' , 'r' , 'l' , 'd' , '\x0' , '\x0' }
strlen size:  5 { 'h', 'e' , 'l' , 'l' , 'o' }

By the way, the last \0 in your string is not needed, as it is automatically added, which you can notice in the above output.

string_literal:

Secondly, at translation phase 7, a terminating null character is added to each string literal

In your question, you wrote:

However, if a string contains multiple null terminators

This part of the question does not make sense. §7.1.1 ¶1 of the C23 standard defines a string like this:

A string is a contiguous sequence of characters terminated by and including the first null character.

Therefore, it is not possible for a string to contain multiple null characters.

However, it is possible for an array to contain multiple strings, for example like this:

char arr[] = "hello\0world";

This line is equivalent to:

char arr[] = { 'h', 'e', 'l', 'l', 'o', '\0', 'w', 'o', 'r', 'l', 'd', '\0' };

Calling strlen with the argument arr or &arr[0], which are both pointers pointing to the character 'h', will return the value 5, because that is the length of the string "hello".

Calling strlen with the argument &arr[6], which is a pointer pointing to the character 'w', will return the value 5, because that is the length of the string "world".

If you want to determine the length of the entire array (which is the actual memory size), then you should not be using strlen, but rather the sizeof operator. The result of sizeof arr will be the value 12 in the above example, which is the length of both strings and both terminating null characters.

Here is a short demonstration program:

#include <stdio.h>
#include <string.h>

int main( void )
{
    char arr[] = "hello\0world";

    printf( "%zu\n", strlen( &arr[0] ) );
    printf( "%zu\n", strlen( &arr[6] ) );
    printf( "%zu\n", sizeof arr );
}

This program has the following output:

5
5
12

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论