An C++ 98 Optional: Part II - A simple hack
Back to the past
After finishing the ground work we can now focus on the crucial parts, we
need to change in order to become compliant with C++98: we must replace the
(unrestricted) union
with something else. Therefore we
need to touch ever line of code which takes advantage of the union
.
But how can we get rid of the union
?. We needed to union in the first place to avoid the construction of T
in the
case of an optional
without a value and to enable the use of types
without a default destructor. So the basic issue was,
that we needed more fine grained control of the lifetime of mValue
– the stored value.
The lifetime of any object begins basically after initialization
has been completed (e.g. by calling the constructor) and as soon as the destructor is called. This means that we can
control an objects lifetime by calling constructor and destructor. We can see above, that we are basically doing this
already: we call the constructor using placement new operator in constructValue
and the destructor in
destructValue
. In the constructValue
function mValue
serves as storage to the
placement new operator which then will create the object
at this storage location.
The placement new operator is actually not limited to unions as storage providers, we can basically pass any (almost –
we’ll get back to that) memory location , which provides enough memory to store a object of the desired type in it, to
the placement new operator. This includes e.g. an array of char
s. This comes in quite handy, as a char
is exactly
one byte large, meaning, that we can easily create an array of char
s which is exactly as big as we need it to be.
Just in case you didn’t know it already: sizeof
returns the size of the given type in bytes.
Therefore we can replace the union with mBuffer
as a first step. Now we just need a way to access the object, which is
stored in the memory provided by mBuffer
. This can be done with
reinterpret_cast. Beside other things, reinterpret_cast
can be used to convert a pointer to any other pointer type; hence we can use it to convert &mBuffer
to T*
, just like
this.
With this tools at hand, we now can adapt our optional – anticipating the transition to C++98 – by
- Replacing
mValue
and theunion
definition with the array ofchar
smBuffer
and - Migrating each access to
T
to the respectivereinterpret_cast
.
Note, that we had to cast &mBuffer
to const T*
in the const
-version of the *
-operator. This is necessary because
reinterpret_cast
can not “cast away” the const
; in this version of the *
-operator mBuffer
– and therefore
&mBuffer
– is const
, so we can only cast it to const T*
. This is one of the advantages of C++-style casts over
c-style casts: they circumvent only parts of the type system, not all of
it.
With these changes, we now can go back to C++98 by adapting CMAKE_CXX_STANDARD
in the CMakeLists.txt
accordingly.
Our tests will compile and succeed. Are we already done?
Actually we aren’t. Our current implementation relies on undefined behaviour. And the underlying issue is all about alignment.
Data alignment
What exactly is alignment? The C++ says:
Object types have alignment requirements which place restrictions on the addresses at which an object of that type may be allocated. An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated. An object type imposes an alignment requirement on every object of that type.
In C++ objects can not be placed anywhere in memory on any arbitrary address; each type can only be placed at certain addresses in memory which are defined by the alignment requirements of the respective types. The Standard defines the alignment of a type as the “number of bytes between successive addresses at which a given object can be allocated”. This usually means (at least on most modern platforms) that a object can only be placed at addresses which are an integer multiple of its alignment.
“Ok, but why is there such a thing as alignment at all?”, you may ask. Well, there are several reasons, but for us the most important may be, that there are platforms which can access values in memory faster if they are aligned (they are placed at a memory locations which is a multiple of their alignment). On x86 this might (at least on modern CPU) not be an issue, but on some ARM platform, it will. Actually, there are some instructions on x86, which rely on the alignment of their operands, so alignment might even be an issue in x86. We’ll get back to that.
Alignment requirements are important for build-in types such as int
or float
. Processors have special instructions
to deal with values of these types efficiently. These operations often require their operands to be aligned. The
alignment of build-in types is identical with their size, meaning that e.g. a float
(sizeof(float)
is 4) must be 4
bytes aligned.
The compiler must ensure that all build-in types (fundamental types as well as pointer types and references) are placed at locations matching the respective alignment requirements, if they are placed in static memory (basically global variables of all kinds) or on the stack. Note the alignment of dynamic memory can not be controlled by the compiler as dynamic allocation is done by library functions like malloc or the new-operator; these functions must provide properly aligned memory (we’ll get back to that).
But there are not only build-in types, there are also
“composed types” (not a standard’s term!) which are composed out of other types. These
types are struct
s,class
es, union
s and arrays. For these types, the compiler must make sure, that each of it’s
members is always (under all circumstances) aligned properly. In case of arrays this can be easily achieved by
imposing the exact same alignment requirements on the array as on it’s elements.
For union
s this is not that easy: a union
may have members with differing alignment requirements. At this point, a
particular property of all alignment requirements comes in handy: all alignment requirements are a power of two. This
implies, that given two alignment requirements the larger one of them will always be an integer multiple of the smaller
one. If an address is aligned according to the larger alignment requirement, it will also be aligned according to
smaller alignment requirements. For union
s this means, that they will be aligned properly for all their members if the
union
inherits the alignment requirements of the member with the largest alignment requirement.
struct
s and class
es need an additional technique to ensure the alignment requirements of their members. We need to
take into account here that the order in which the members of a class
in memory must be exactly the same as in it’s
definition, so there is now room for the compiler to reorder them. But it may add padding bytes. The compiler is
allowed to place “unused bytes” between the members of a class
in order to ensure the alignment requirements of each
member (An instructive example can he found
here.
Note, that the addition of padding bytes will (obviously) increase the size of the class
. In many cases the size of a
class
will be larger than the sum of it’s members.
Alignment Rule Wrap Up
Let’s wrap up the rules of alignment in C++98.
- Every object (everything in memory) has alignment requirements, which are integer values and a power of two.
- “Build-in types” such as fundamental types (except
void
), all pointer and reference types are aligned by their size. - Arrays inherit their alignment requirements form their elements.
struct
s,class
es andunion
s inherit the alignment requirements of their member with the largest alignment requirement.- For
struct
s/class
s the compiler may add padding bytes between successive members in order to ensure that their alignment requirements of all members are met. - By applying these rules the alignment of each type compliant with the C++ Standard can be determined (in C++98).
Note that these rules imply, that there is a maximum alignment requirement defined by the largest “build-in type”. This is true only for C++98 and C++03 though. C++11 added a specific keyword alignas with which the alignment of a type or variable can be specified explicitly. With this keyword larger alignments are possible. This is referred to as overalignment.
Please note, that I do not claim to be an expert on alignment. If someone finds an issue with these rules, please let me know, so that I can fix this post.
Alignment by Example
Our optional can provide us here a nice example. Let’s go back to our starting point.
If we execute this
code, we’ll see, that the size of optional_unsigned_int
is 8 while the sum of it’s member’s sizes is 5. The reason for that is, that the compiler added 3 padding bytes between mIsSet
and mValue
in order to make sure, that mValue
is always 4 byte aligned.
Now let’s make the same test with an memory layout equivalent struct
to our modified optional implementation.
If we now execute this
code, we’ll see that now the size of optional_unsigned_int
is 5. The compiler didn’t add any padding bytes. The reason for that is, that mValue
now is an array of char
s. char
s only need to be 1 byte aligned. As arrays inherit the alignment requirements of their elements, the array also only needs to be 1 byte aligned. A 1 byte aligned object can be placed anywhere in memory, so there is no need to add additional padding bytes here.
The issues with our naive implementation
The example above pretty much shows already, what the issue with our new C++98 implementation may be: in almost all
cases mBuffer
will not be aligned properly, as it is just an array of char
s. Even for types such as float
and int
it will not match their alignment requirements. It is undefined behaviour to
supplying an unaligned address to placement-new, but we just did it.
The only reason why our tests didn’t fail is that we ran the tests on a x84 machine (at least I did so), which can deal with
unaligned memory access, and a GCC version, which generated “working code” for our tests. As we invoked undefined
behavior here, the compiler wasn’t even required to generated any code here. It just did it more or less out of mere luck.
Was there any chance we could have encountered the issue at some point in time? Actually there is a way we could have detected the issue with our current test suite: by using sanitizers. GCC’s undefined behavior sanitizer can detect several cases of undefined behavior during runtime by augmenting the binary with additional checks – and one of these checks is about alignment issues. We can enable the undefined behavior sanitizer we need to rerun cmake like this:
After recompiling and running the tests we will get quite a view runtimer error
s looking like this:
A somewhat practical example
In order to show you, that alignment issues may have serious consequences even on x86, I’d like to show you this somewhat contrived but instructive example.
If we compile this program with the flags -O1 -msse4.2 -fno-inline
using GCC and run it, we will get a
segmentation fault. (Link to Compiler Explorer; please note, that Program returned: 139
means, that the program segfaulted)
This happens because GCC uses a specific instruction to copy Vector4i
:
MOVDQA. This instruction can move four integers
(32-bit) at once, if they are 128-bit aligned. Vector4i
is supposed to be 128-bit (because of its long double
member) aligned and contains four
integers, so the compiler is actually allowed to do that. Unfortunately mBuffer
is not 128-bit aligned.
The documentation of MOVDQA states clearly, that ins such a case a general proctection exection is thrown,
which will result in a segfault under Linux.
Actually this example is probably not that contrived: it turns out, that we can encounter similar issues
, if we use Eigen::Vector4i
which is provided by the popular linear algebra library Eigen.
Fixing the alignment with an workaround
So, how can we fix our optional then? First of all, we need to go back to alignment rules above. In general we can identify two cases here: a type is
- either “composed” like
struct
s,class
es,union
s and arrays – these types inherit their alignment from their members – - or “build-in”, which means that it’s alignment is defined by it’s size.
As all pointer types have the same size, the set of different alignment requirements is finite. Therefore there must be a maximal alignment requirement, which then would be suitable for each and every standard compliant type.
Beside this chain of arguments, there must be such a maximum alignment requirement, simply because of
std::malloc. malloc
just returns a void
pointer –
it does not know the type which the allocated memory is supposed to be used for. As we can pass the return value of
malloc
to the placement-new operator, the allocated memory must be aligned in a way which fits for each an every type.
The cppreference documentation of std::malloc says exactly that:
If allocation succeeds, returns a pointer to the lowest (first) byte in the allocated memory block that is suitably aligned for any scalar type (at least as strictly as std::max_align_t).
So there is not only a maximum alignment requirement, there is actually a type std::max_align_t, which is defined to have the most strict alignment requirements possible. Unfortunately it has been introduced in C++11, which we can not use here.
But we can implement something similar on our own. We already know, that a union
always inherits it’s alignment
requirements from the member with the most strict alignment requirements. Therefore we can implement std::max_align_t
using a union
which has a member for each “build-in” type.
This list of types is not comprehensive, but it should fit our needs:
long long
is the largest integer type on all platforms (64 bit,long
might have the same size, but no other integer type is larger).long long
will be at least as large as the processors register width, so there should not be any need for adding any pointer types here.long double
is the largest floating point (either 128 or 64 bit).
If we have a sneak peek on GCCs implementation we see, that we are actually not far off of it. I suppse that we got as far as one may get in Standard C++ without using compiler intrinsics.
We can now use this max_align_t
to fix the alignment of mBuffer
: we put them both together in a union
.
The union
will inherit the alignment requirements of max_align_t
and therefore mBuffer
will always fulfill the
alignment requirements of max_align_t
, too. As this is the maximal possible alignment for all types (in C++98), it
will suffice for each and every T
we might come up with. Our tests will succeed without any issues – even if we
enable the undefined behavior sanitizer. Even our example above will work,
if we adapt if accordingly.
Conclusion
Ok, this was probably the most complex post so far: we had to deal with data alignment and it’s rules. We had to
understand C++’s alignment rules and how they affect our code – especially with respect to undefined behavior.
But finally we could derive a suitable solution for the problem at hand: we made sure, that the alignment requirements
for all T
s are met.
But I already called this solution a hack and I did it for a reason. With this solution, any optional<T>
will always
be sizeof(max_align_t) * 2
bytes large. On x86, this will usually be
32 bytes!
In many cases, this is quite wasteful. A struct
containing a bool
and a int
is only
8 bytes large.
Therefore we will try to come up with another solution in the upcoming post.