This is a problem of potential undefined behavior of uninitialized memory in unsafe rust.
- first, allocating vector by usual safe way (type
Vec<Vec<usize>>
, or must be vector of something on heap instead of stack; in other words,Vec<usize>
type will not panic here); - second, clone the vector in a new scope;
- third, allocating a new vector by unsafe uninitialized way (without using
MaybeUninit
); then a double free occurs.
The code (playground) listed as follows:
#[deny(clippy::uninit_vec)]
unsafe fn uninitialized_vec<T>(size: usize) -> Vec<T> {
let mut v: Vec<T> = Vec::with_capacity(size);
unsafe { v.set_len(size) };
v
}
fn case_1() {
println!("=== Case 1 ===");
let vec_a: Vec<Vec<usize>> = vec![vec![0]; 4];
let _ = vec_a.clone();
let _: Vec<Vec<usize>> = unsafe { uninitialized_vec(4) };
}
fn case_2() {
println!("=== Case 2 ===");
let vec_a: Vec<Vec<usize>> = vec![vec![0]; 4];
{
let _ = vec_a.clone();
}
let _: Vec<Vec<usize>> = unsafe { uninitialized_vec(4) };
}
fn main() {
case_1();
case_2();
}
- debug mode: signal 6 (SIGABRT): abort program, double free detected in tcache 2;
case_2
will stuck here; - release mode: signal 4 (SIGILL): illegal instruction; even
case_1
will stuck here.
From my common sense, the code itself does not intended to double free any variable. We just declared an uninitialized variable, but not using that variable. If this is not a bug, compiler optimization is probably the only reason that can explain this problem.
What I'm curious is that: whether this code actually triggers undefined behavior in unsafe rust, that compiler optimization may cause double free or other problems. And
- if it is actually a UB, I feel this code can really confuses rust newcomers (that is saying myself).
- if it is not a UB, is that a bug of unsafe rust?
Additionally, it is known that by using MaybeUninit
properly may avoid double-free runtime error like this:
use std::mem::MaybeUninit;
unsafe fn uninitialized_vec<T>(size: usize) -> Vec<MaybeUninit<T>> {
let mut v: Vec<MaybeUninit<T>> = Vec::with_capacity(size);
unsafe { v.set_len(size) };
v
}
fn main() {
println!("=== This will not cause error ===");
let vec_a: Vec<Vec<usize>> = vec![vec![0]; 12];
{
let _ = vec_a.clone();
}
let _: Vec<MaybeUninit<Vec<usize>>> = unsafe { uninitialized_vec(12) };
}
Still, MaybeUninit
can be inconvenient in many circumstances. Subjectively, I prefer using Vec<T>
instead of Vec<MaybeUninit<T>>
, especially when I can make sure values of this uninitialized vector will be correctly filled later.
This is a problem of potential undefined behavior of uninitialized memory in unsafe rust.
- first, allocating vector by usual safe way (type
Vec<Vec<usize>>
, or must be vector of something on heap instead of stack; in other words,Vec<usize>
type will not panic here); - second, clone the vector in a new scope;
- third, allocating a new vector by unsafe uninitialized way (without using
MaybeUninit
); then a double free occurs.
The code (playground) listed as follows:
#[deny(clippy::uninit_vec)]
unsafe fn uninitialized_vec<T>(size: usize) -> Vec<T> {
let mut v: Vec<T> = Vec::with_capacity(size);
unsafe { v.set_len(size) };
v
}
fn case_1() {
println!("=== Case 1 ===");
let vec_a: Vec<Vec<usize>> = vec![vec![0]; 4];
let _ = vec_a.clone();
let _: Vec<Vec<usize>> = unsafe { uninitialized_vec(4) };
}
fn case_2() {
println!("=== Case 2 ===");
let vec_a: Vec<Vec<usize>> = vec![vec![0]; 4];
{
let _ = vec_a.clone();
}
let _: Vec<Vec<usize>> = unsafe { uninitialized_vec(4) };
}
fn main() {
case_1();
case_2();
}
- debug mode: signal 6 (SIGABRT): abort program, double free detected in tcache 2;
case_2
will stuck here; - release mode: signal 4 (SIGILL): illegal instruction; even
case_1
will stuck here.
From my common sense, the code itself does not intended to double free any variable. We just declared an uninitialized variable, but not using that variable. If this is not a bug, compiler optimization is probably the only reason that can explain this problem.
What I'm curious is that: whether this code actually triggers undefined behavior in unsafe rust, that compiler optimization may cause double free or other problems. And
- if it is actually a UB, I feel this code can really confuses rust newcomers (that is saying myself).
- if it is not a UB, is that a bug of unsafe rust?
Additionally, it is known that by using MaybeUninit
properly may avoid double-free runtime error like this:
use std::mem::MaybeUninit;
unsafe fn uninitialized_vec<T>(size: usize) -> Vec<MaybeUninit<T>> {
let mut v: Vec<MaybeUninit<T>> = Vec::with_capacity(size);
unsafe { v.set_len(size) };
v
}
fn main() {
println!("=== This will not cause error ===");
let vec_a: Vec<Vec<usize>> = vec![vec![0]; 12];
{
let _ = vec_a.clone();
}
let _: Vec<MaybeUninit<Vec<usize>>> = unsafe { uninitialized_vec(12) };
}
Still, MaybeUninit
can be inconvenient in many circumstances. Subjectively, I prefer using Vec<T>
instead of Vec<MaybeUninit<T>>
, especially when I can make sure values of this uninitialized vector will be correctly filled later.
2 Answers
Reset to default 8We just declared an uninitialized variable, but not using that variable
This is incorrect. We are using this variable - namely, we are dropping the outer Vec
, therefore dropping all the Vec
s in it, and since they are uninitialized, we're trying to drop arbitrary memory.
whether this code actually triggers undefined behavior in unsafe rust
It does, according to documentation for set_len
:
Safety
<...>
The elements at old_len..new_len must be initialized.
This requirement is explicitly violated.
undefined behavior in unsafe rust, that compiler optimization may cause double free or other problems
If you trigger undefined behavior, compilation process may do to your program anything at all, by definition. It's not "this code may be misoptimized in this way" - it's "you lied to the compiler, all bets are off".
Disclaimer: Cerberus answered the main question, I'll focus on best practices/implicit questions.
Correct usage of set_len
Subjectively, I prefer using
Vec<T>
instead ofVec<MaybeUninit<T>>
, especially when I can make sure values of this uninitialized vector will be correctly filled later.
As mentioned in @Cerberus answer, set_len
should only be used after the elements have been initialized.
That is, the correct way to use set_len
is:
fn main() {
const SIZE: usize = 5;
let mut vec: Vec<Vec<usize>> = Vec::with_capacity(SIZE);
{
let spare = vec.spare_capacity_mut();
(0..SIZE).for_each(|i| { spare[i].write(Vec::new()); });
}
// Safety:
// - All elements in 0..SIZE have been initialized.
unsafe { vec.set_len(SIZE); }
}
Pre-pooping your pants with Rust
Gankra1, one of the lead designer of unsafe Rust in the early days, wrote an article called Pre-pooping your pants with Rust, which is very a-propos here.
The gist of the article, is that one key difficulty for writing sound unsafe
code is anticipating all the ways that something could potentially go wrong: any early return, any panic, etc... which could occur before you "fulfill" a promise.
Hint: it's basically impossible, in the face of ever-changing code.
The solution that Grankra proposed is thus to "pre-poop" your pants. For example, when you call vec.drain(start..end)
, all the elements in start..end
should be removed, and only the elements in 0..start
and end..
(catenated together) remain. But who knows what could occur while doing all that?
Thus, the code for drain will:
- Set the length of
vec
tostart
. It will not leave it at the current length, nor set it at the final length. - Poke a hole in the vec, until all elements in
start..end
have been moved out. - Move the elements in
end..
tostart..
(filling the hole, partially). - And ONLY THEN set the length of
vec
to the final number of elements that actually remain.
If for whatever reason the Drain
iterator is leaked in the middle of all this, leaving a hole with uninitialized elements... it's all sound, because all the elements in 0..start
are (still) initialized, and that's the only promise that vec
is making until (4), and at (4) all is good again.
Or put another way: you never promise that you will do something with unsafe, instead you first do it, and then announce it's done.
1 You may know Gankra from Learn Rust With Entirely Too Many Linked Lists.
MaybeUninit is inconvenient
Yes. Yes it is.
In fact, properly written unsafe
code, with all the // Safety
annotations, is tedious to write. And verbose. That's what it takes to write verifiably sound unsafe
code.
If you can't be bothered, then don't use unsafe
!
I would note that there's no benefit to using unsafe
in the scenarios you've showcased here:
fn defaulted_vec<T>(size: usize) -> Vec<T>
where
T: Default,
{
(0..size).map(T::default).collect()
}
fn case_1() {
println!("=== Case 1 ===");
let vec_a: Vec<Vec<usize>> = vec![vec![0]; 4];
let _ = vec_a.clone();
let _: Vec<Vec<usize>> = defaulted_vec(4);
}
fn case_2() {
println!("=== Case 2 ===");
let vec_a: Vec<Vec<usize>> = vec![vec![0]; 4];
{
let _ = vec_a.clone();
}
let _: Vec<Vec<usize>> = defaulted_vec(4);
}
fn main() {
case_1();
case_2();
}
defaulted_vec
will create a Vec
of 4 default values, which are cheap -- because Vec::new()
doesn't allocate anything -- and you can then use them/overwrite them at leisure.
TOOLS
at the top right and it correctly detects this UB. It's not comprehensive though so if it doesn't detect something doesn't mean there is no UB, just that miri didn't find it. – cafce25 Commented Mar 6 at 13:28