I'm reviewing classical object (de)serialization code from file but I'm wondering if it is UB.
I'm making simplifying hypotheses:
- in next snippet, the corresponding file is supposed well formed (a
std::uint32_t
, astd::int64_t
and afloat
with the right endianness; thefloat
being in the same representation as in the program) - I'm reading only implicit lifetime types, trivially copyable, trivially destructible.
struct Content
{
std::uint32_t first;
std::int64_t second;
float last;
}
std::string Path(<Some valid path>); // path to a binary file holding only data of implicit lifetime type
std::ifstream is(Path, std::ios::binary);
Content content;
is.read(reinterpret_cast<char *>(&content.first), sizeof(content.first));
is.read(reinterpret_cast<char *>(&content.second), sizeof(content.second));
is.read(reinterpret_cast<char *>(&content.last), sizeof(content.last));
I know that this kind of code is used without issues for ages but is the reinterpret_cast
legal in this case and why ?
Or should we go for:
char buffer[sizeof(std::size_t)];
is.read(buffer, sizeof(content.first));
std::memcpy(&content.first,buffer,sizeof(content.first));
...
or
char buffer[sizeof(std::size_t)];
is.read(buffer, sizeof(content.first));
content.first=std::bit_cast<std::uint32_t>(buffer);
...
?
I'm reviewing classical object (de)serialization code from file but I'm wondering if it is UB.
I'm making simplifying hypotheses:
- in next snippet, the corresponding file is supposed well formed (a
std::uint32_t
, astd::int64_t
and afloat
with the right endianness; thefloat
being in the same representation as in the program) - I'm reading only implicit lifetime types, trivially copyable, trivially destructible.
struct Content
{
std::uint32_t first;
std::int64_t second;
float last;
}
std::string Path(<Some valid path>); // path to a binary file holding only data of implicit lifetime type
std::ifstream is(Path, std::ios::binary);
Content content;
is.read(reinterpret_cast<char *>(&content.first), sizeof(content.first));
is.read(reinterpret_cast<char *>(&content.second), sizeof(content.second));
is.read(reinterpret_cast<char *>(&content.last), sizeof(content.last));
I know that this kind of code is used without issues for ages but is the reinterpret_cast
legal in this case and why ?
Or should we go for:
char buffer[sizeof(std::size_t)];
is.read(buffer, sizeof(content.first));
std::memcpy(&content.first,buffer,sizeof(content.first));
...
or
char buffer[sizeof(std::size_t)];
is.read(buffer, sizeof(content.first));
content.first=std::bit_cast<std::uint32_t>(buffer);
...
?
Share Improve this question asked Mar 17 at 16:53 OerstedOersted 2,9836 silver badges29 bronze badges 15 | Show 10 more comments1 Answer
Reset to default 0Eventually, I found that, under reinterpret_cast conversion: [expr.reinterpret.cast], the cast by itself is of course valid.
Then under type aliasing: [basic.lval] I can access the object representation of the data members through a glvalue of type char
. This glvalue is the first parameter of std::ifstream::read
, initialized by the reinterpret_cast
.
Eventually the behavior is well defined if and only if the modified object representation is a valid object representation for the destination object.
Yet, due to endianness issues, floating point representation,... the object representation might be legal but the obtained value might not be the expected one.
read
into thefloat
is a recipe for disaster. I've seen code with this that works "sometimes" and other times not - even ifread
returns4
as expected. Just don't do that. Read into a buffer andbit_cast
ormemcpy
from that. – Ted Lyngmo Commented Mar 17 at 17:59read
has only to copy the object representation. – Oersted Commented Mar 17 at 18:03