rust - Get a &mut [u8] from a MaybeUninit for writing to

I want to do something like:

let mut buf: MaybeUninit<[u8; BUF_SIZE]> = MaybeUninit::uninit();
let buf: &mut [u8] = unsafe { &mut *buf.as_mut_ptr() };

let mut len = 0;
loop {
    let n = stream.read(&mut buf[len..])?;
    len += n;
    // ...
}

AFAIK you have to use unsafe code, because the compiler can't verify that you're not reading from the uninitialized memory. This should improve if/when rust finally provides a safe way to read into a MaybeUninit buffer.

But what's the canonical way to do this? I've also seen it done with transmute and std::slice::from_raw_parts_mut. The transmute is ugly as sin if you specify the types, which you should, and the from_raw_parts_mut loses the buffer size info, which gives you one extra opportunity to make a mistake.

The docs for as_mut_ptr() basically give very similar code as an example of undefined behavior. I don't agree with the docs, I think it only is undefined behavior if you read from it, and I don't. Am I wrong? Why?

Is there some library that lets me do this in a nicer way? I checked the bytes crate an smallvec crates, but the former creates a heap buffer and the latter doesn't make the code any better.

I want to do something like:

let mut buf: MaybeUninit<[u8; BUF_SIZE]> = MaybeUninit::uninit();
let buf: &mut [u8] = unsafe { &mut *buf.as_mut_ptr() };

let mut len = 0;
loop {
    let n = stream.read(&mut buf[len..])?;
    len += n;
    // ...
}

Is there some library that lets me do this in a nicer way? I checked the bytes crate an smallvec crates, but the former creates a heap buffer and the latter doesn't make the code any better.

Share Improve this question edited Feb 16 at 15:36 John Kugelman 362k69 gold badges552 silver badges596 bronze badges asked Feb 16 at 15:31 Eloff 21.7k19 gold badges89 silver badges122 bronze badges

4 Yes, you are wrong. This is UB currently. Also there is no such thing as "do not agree with the docs", the docs define what is UB. – Chayim Friedman Commented Feb 16 at 15:33
1 If you cast a pointer to a reference, the following conditions have to be met. More specifically, quote "These rules apply even if the result is unused! (The part about being initialized is not yet fully decided, but until it is, the only safe approach is to ensure that they are indeed initialized.)". – Finomnis Commented Feb 16 at 16:51
1 As already mentioned before, the Rust docs are the definition of what UB is. So even if it sounds like it's fine to you, if the Rust docs do not allow it, the compiler is allowed to perform optimizations that you might not have taken into account. – Finomnis Commented Feb 16 at 16:52
1 "most programs don't care about performance" - There are many memory safe languages out there that prevent UB, but Rust is one of the very few that do so while being performant, at the cost of a steep learning curve. So I'd argue that most Rust programs care about performance, otherwise they would be written in an easier language, like Python or Go. That said, please benchmark if it really makes a difference, this sounds like a premature optimization to me. – Finomnis Commented Feb 16 at 16:55
2 The way to do this with safe code is to use BorrowedCursor, but that is currently an unsafe API. – PitaJ Commented Feb 16 at 17:34

| Show 10 more comments

2 Answers 2

Sorted by: Reset to default 2

Given that people here (including me) have outright forbade this (and for a good reason!), I'd like to give a more nuanced stance.

I'll begin with: as already stated here, and cited from the docs, you must not do this. This is in undecided territory, and as such, it definitely could be declared UB.

However, the general stance of t-opsem (the team that is responsible for decisions about unsafe code) seem to be to eventually allow this. See https://github/rust-lang/unsafe-code-guidelines/issues/346 and https://github/rust-lang/unsafe-code-guidelines/issues/412. There are many reasons (explained in detail in the linked pages), and they can require understanding of deep corners of the memory model, so I'll avoid explaining them here. What you should know is that if and when (and only when!) this will be formally accepted, creating &mut [u8] (or any other reference) pointing to uninitialized memory won't be UB, but reading u8s from it will, and perhaps writing too (that depends on other undecided questions).

So if this will get accepted, your code will become sound, at one crucial condition: you wrote the read() function, and you know it doesn't read from its argument (alternatively, it is provided by a library, but the library guarantees it doesn't read from its argument). This is because safe code can trust any unsafe code, but unsafe code can only trust known safe code. So read() can't be generic - otherwise, it'd be possible to supply a type that reads from the provided reference, creating unsoundness. It can neither be an std reader (e.g. File) - because those do not guarantee they don't read from their argument. For all you know, in the next Rust/library version they may suddenly start doing that, even if they haven't previously. That leaves pretty much only code you personally maintain.

So isn't there a solution? There is; as mentioned above, it's BorrowedBuf. It's a type designed to encapsulate the unsafety of reading into uninitialized buffers, by providing an interface that allows the buffer to be given even to untrusted code and trusting BorrowedBuf to not let them do forbidden things.

There are two problems in this rosy vision:

BorrowedBuf is unstable (and have been for years), because people still aren't sure what's the best API for it.
Even when it was stabilized, it is opt-in: you will need your reader to implement the read_buf() method, otherwise it'll just fill the buffer and forward to read().

So for now, the only path forward is to write all the code yourself, including the reader, and use raw pointers only. But before committing to that, please benchmark - most probably, you will discover that the overhead of zeroing few bytes isn't noticeable at all.

This is and will not be possible _{(strictly speaking)}

Rust's safety guarantees include that all references (mutable or not) point to initialised memory, which goes against the very idea of MaybeUninit design choice. As Finomnis pointed out, this rule is stated in the API reference (and applies even if you disagree with it :P).

Doing it the safe and well-defined way

Now, there does exist an API for incrementally filling an uninitialised buffer (although it is currently unstable – i.e. requires the nightly toolchain). PitaJ hinted at BorrowedCursor, which can be constructed as an object by BorrowedBuf. This type safely tracks which parts of a buffer are filled, initialised but unfilled, and uninitialised – respectively – and therefore supports incrementally initialising a buffer.

Doing it the stable way

The usual way of achieving this would be to simply use the Vec type. You did specifically mention your wanting to avoid heap allocations, but Vec does actually support custom allocators ~~(and by extend non-heap allocators)~~, either through the global allocator, or with the allocator api.

The latter, however, is currently an unstable interface in std (see allocator_api), but the allocator_api2 does crate provide a mirror that works with the stable toolchain.

_{You did specify that smallvec did not meet your expectations, but without more information about your case, these are the suggestions I've got.}

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

rust - Get a &mut [u8] from a MaybeUninit for writing to - Stack Overflow

2 Answers 2

This is and will not be possible _{(strictly speaking)}

Doing it the safe and well-defined way

Doing it the stable way

与本文相关的文章

评论列表(0)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

2 Answers 2

This is and will not be possible (strictly speaking)

Doing it the safe and well-defined way

Doing it the stable way

与本文相关的文章

评论列表(0)

This is and will not be possible _{(strictly speaking)}