I want to do something like:
let mut buf: MaybeUninit<[u8; BUF_SIZE]> = MaybeUninit::uninit();
let buf: &mut [u8] = unsafe { &mut *buf.as_mut_ptr() };
let mut len = 0;
loop {
let n = stream.read(&mut buf[len..])?;
len += n;
// ...
}
AFAIK you have to use unsafe code, because the compiler can't verify that you're not reading from the uninitialized memory. This should improve if/when rust finally provides a safe way to read into a MaybeUninit buffer.
But what's the canonical way to do this? I've also seen it done with transmute and std::slice::from_raw_parts_mut. The transmute is ugly as sin if you specify the types, which you should, and the from_raw_parts_mut loses the buffer size info, which gives you one extra opportunity to make a mistake.
The docs for as_mut_ptr() basically give very similar code as an example of undefined behavior. I don't agree with the docs, I think it only is undefined behavior if you read from it, and I don't. Am I wrong? Why?
Is there some library that lets me do this in a nicer way? I checked the bytes crate an smallvec crates, but the former creates a heap buffer and the latter doesn't make the code any better.
I want to do something like:
let mut buf: MaybeUninit<[u8; BUF_SIZE]> = MaybeUninit::uninit();
let buf: &mut [u8] = unsafe { &mut *buf.as_mut_ptr() };
let mut len = 0;
loop {
let n = stream.read(&mut buf[len..])?;
len += n;
// ...
}
AFAIK you have to use unsafe code, because the compiler can't verify that you're not reading from the uninitialized memory. This should improve if/when rust finally provides a safe way to read into a MaybeUninit buffer.
But what's the canonical way to do this? I've also seen it done with transmute and std::slice::from_raw_parts_mut. The transmute is ugly as sin if you specify the types, which you should, and the from_raw_parts_mut loses the buffer size info, which gives you one extra opportunity to make a mistake.
The docs for as_mut_ptr() basically give very similar code as an example of undefined behavior. I don't agree with the docs, I think it only is undefined behavior if you read from it, and I don't. Am I wrong? Why?
Is there some library that lets me do this in a nicer way? I checked the bytes crate an smallvec crates, but the former creates a heap buffer and the latter doesn't make the code any better.
Share Improve this question edited Feb 16 at 15:36 John Kugelman 362k69 gold badges552 silver badges596 bronze badges asked Feb 16 at 15:31 EloffEloff 21.7k19 gold badges89 silver badges122 bronze badges 15 | Show 10 more comments2 Answers
Reset to default 2Given that people here (including me) have outright forbade this (and for a good reason!), I'd like to give a more nuanced stance.
I'll begin with: as already stated here, and cited from the docs, you must not do this. This is in undecided territory, and as such, it definitely could be declared UB.
However, the general stance of t-opsem (the team that is responsible for decisions about unsafe code) seem to be to eventually allow this. See https://github/rust-lang/unsafe-code-guidelines/issues/346 and https://github/rust-lang/unsafe-code-guidelines/issues/412. There are many reasons (explained in detail in the linked pages), and they can require understanding of deep corners of the memory model, so I'll avoid explaining them here. What you should know is that if and when (and only when!) this will be formally accepted, creating &mut [u8]
(or any other reference) pointing to uninitialized memory won't be UB, but reading u8
s from it will, and perhaps writing too (that depends on other undecided questions).
So if this will get accepted, your code will become sound, at one crucial condition: you wrote the read()
function, and you know it doesn't read from its argument (alternatively, it is provided by a library, but the library guarantees it doesn't read from its argument). This is because safe code can trust any unsafe code, but unsafe code can only trust known safe code. So read()
can't be generic - otherwise, it'd be possible to supply a type that reads from the provided reference, creating unsoundness. It can neither be an std reader (e.g. File
) - because those do not guarantee they don't read from their argument. For all you know, in the next Rust/library version they may suddenly start doing that, even if they haven't previously. That leaves pretty much only code you personally maintain.
So isn't there a solution? There is; as mentioned above, it's BorrowedBuf
. It's a type designed to encapsulate the unsafety of reading into uninitialized buffers, by providing an interface that allows the buffer to be given even to untrusted code and trusting BorrowedBuf
to not let them do forbidden things.
There are two problems in this rosy vision:
BorrowedBuf
is unstable (and have been for years), because people still aren't sure what's the best API for it.- Even when it was stabilized, it is opt-in: you will need your reader to implement the
read_buf()
method, otherwise it'll just fill the buffer and forward toread()
.
So for now, the only path forward is to write all the code yourself, including the reader, and use raw pointers only. But before committing to that, please benchmark - most probably, you will discover that the overhead of zeroing few bytes isn't noticeable at all.
This is and will not be possible (strictly speaking)
Rust's safety guarantees include that all references (mutable or not) point to initialised memory, which goes against the very idea of MaybeUninit
design choice.
As Finomnis pointed out, this rule is stated in the API reference (and applies even if you disagree with it :P).
Doing it the safe and well-defined way
Now, there does exist an API for incrementally filling an uninitialised buffer (although it is currently unstable – i.e. requires the nightly toolchain).
PitaJ hinted at BorrowedCursor
, which can be constructed as an object by BorrowedBuf
. This type safely tracks which parts of a buffer are filled, initialised but unfilled, and uninitialised – respectively – and therefore supports incrementally initialising a buffer.
Doing it the stable way
The usual way of achieving this would be to simply use the Vec
type.
You did specifically mention your wanting to avoid heap allocations, but Vec
does actually support custom allocators (and by extend non-heap allocators), either through the global allocator, or with the allocator api.
The latter, however, is currently an unstable interface in std
(see allocator_api
), but the allocator_api2
does crate provide a mirror that works with the stable toolchain.
You did specify that smallvec
did not meet your expectations, but without more information about your case, these are the suggestions I've got.
BorrowedCursor
, but that is currently an unsafe API. – PitaJ Commented Feb 16 at 17:34