Type-Punning with Union
Posted by binglongx on November 15, 2024
Type-punning with Union
Check this snippet:
union a_union {
int i;
double d;
};
int f() {
union a_union t;
t.d = 3.0;
return t.i;
}
Does the code work to whatever expection you may have?
This is actually one form of type punning through aliasing in union.
According to C++, e.g., C++2023 (N4928) 11.5.1, “At most one of the non-static data members of an object of union type can be active at any time”. The last written value is t.d, so only that member is active. Reading t.i is therefore undefined behavior (UB) in C++.
This practice has been widely used in the C world, and C permits this. Therefore, often times C++ programmers, especially who have legacy code from C, want this to work. Although C++ standard treats this as UB, some compilers support this:
- GCC allows type punning through union, even with
-fstrict-aliasing, see https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type%2Dpunning. - Similarly, Clang / Apple Clang also supports this.
Do check limitations in the URL above, for example, you cannot take the address and do pointer magic.
Example Usage
Let’s say you have existing code:
#include <iostream>
struct Vec3 {
float f[3];
float& operator()(size_t index) {
return f[index];
}
float& operator[](size_t index) {
return f[index];
}
};
int main() {
Vec3 v;
v(0) = 1;
v[1] = 2;
v.f[2] = 5;
std::cout << v[0] << ", " << v(1) << ", " << v.f[2] << "\n";
}
You are happy with the code. But then one day someone comes and asks, “Your Vec3 class is great. I have a lot of existing code that uses a similar 3D vector class, but they access the data through .x, .y and .z. Can I use your nice Vec3 class instead without modifying a lot of my code?”
Basically, you can redefine your Vec3 class, but you want the new Vec3 class to support the current public API, as well as new API that allows access members through .x, .y and .z.
Obviously you cannot add new data members like x, y, and z, because that would have two copies for each value, and there’s no way that you can keep them always in sync as clients can freely modify x or f[0] independently. Adding x, y, and z as references to f members seems to work, but it makes the class much larger, also not memcpy safe.
Enter union, more specifically, anonymous union.
#include <iostream>
struct Vec3 {
union{
struct {
float x;
float y;
float z;
};
float f[3];
};
float& operator()(size_t index) {
return f[index];
}
float& operator[](size_t index) {
return f[index];
}
};
int main() {
Vec3 v;
v(0) = 1;
v[1] = 2;
v.f[2] = 5;
std::cout << v[0] << ", " << v(1) << ", " << v.f[2] << "\n"; // Line AA
std::cout << v.x << ", " << v.y << ", " << v.z << "\n"; // Line BB
}
Line AA demonstrates the API of old Vec3, and Line BB demonstrates the new API. Both are supported.
For this two happen, there are two steps:
- Anonymous struct. In the union definition, the anonymous struct provides
x/y/zmembers. The anonymous struct is both a type definition and an anonymous member object of the union. Being anonymous struct member allows user of parent object, i.e., the union, to directly access the data through.x,.yand.z. - Anonymous union. In the
Vec3class, it has the anonymous union type and the same time as anonymous union member of the class. This allows theVec3class user to access union members directly, because the union member ofVec3does not have a name.
Combining the two steps above, now client of Vec3 can access through .x, .y and .z directly.
Theoretically this approach falls into the UB land, but due to the support of extension in GCC and Clang, this is actually safe to do.
Leave a comment