Binglong's space

Random notes on computer, phone, life, anything

Posts Tagged ‘undefined behavior’

Type-Punning with Union

Posted by binglongx on November 15, 2024

Type-punning with Union

Check this snippet:

union a_union {
  int i;
  double d;
};

int f() {
  union a_union t;
  t.d = 3.0;
  return t.i;
}

Does the code work to whatever expection you may have?

This is actually one form of type punning through aliasing in union.

According to C++, e.g., C++2023 (N4928) 11.5.1, “At most one of the non-static data members of an object of union type can be active at any time”. The last written value is t.d, so only that member is active. Reading t.i is therefore undefined behavior (UB) in C++.

This practice has been widely used in the C world, and C permits this. Therefore, often times C++ programmers, especially who have legacy code from C, want this to work. Although C++ standard treats this as UB, some compilers support this:

Do check limitations in the URL above, for example, you cannot take the address and do pointer magic.

Example Usage

Let’s say you have existing code:

#include <iostream>

struct Vec3 {
    float f[3];

    float& operator()(size_t index) {
        return f[index];
    }
    float& operator[](size_t index) {
        return f[index];
    }
};

int main() {
    Vec3 v;
    v(0) = 1;
    v[1] = 2;
    v.f[2] = 5;
    std::cout << v[0] << ", " << v(1) << ", " << v.f[2] << "\n";
}

You are happy with the code. But then one day someone comes and asks, “Your Vec3 class is great. I have a lot of existing code that uses a similar 3D vector class, but they access the data through .x, .y and .z. Can I use your nice Vec3 class instead without modifying a lot of my code?”

Basically, you can redefine your Vec3 class, but you want the new Vec3 class to support the current public API, as well as new API that allows access members through .x, .y and .z.

Obviously you cannot add new data members like x, y, and z, because that would have two copies for each value, and there’s no way that you can keep them always in sync as clients can freely modify x or f[0] independently. Adding x, y, and z as references to f members seems to work, but it makes the class much larger, also not memcpy safe.

Enter union, more specifically, anonymous union.

#include <iostream>

struct Vec3 { 
    union{
        struct { 
            float x;
            float y;
            float z; 
        }; 
        float f[3];
    };

    float& operator()(size_t index) {
        return f[index];
    }
    float& operator[](size_t index) {
        return f[index];
    }
}; 


int main() {
    Vec3 v;
    v(0) = 1;
    v[1] = 2;
    v.f[2] = 5;

    std::cout << v[0] << ", " << v(1) << ", " << v.f[2] << "\n"; // Line AA
    std::cout << v.x << ", " << v.y << ", " << v.z << "\n";      // Line BB
}

Line AA demonstrates the API of old Vec3, and Line BB demonstrates the new API. Both are supported.

For this two happen, there are two steps:

  • Anonymous struct. In the union definition, the anonymous struct provides x/y/z members. The anonymous struct is both a type definition and an anonymous member object of the union. Being anonymous struct member allows user of parent object, i.e., the union, to directly access the data through .x, .y and .z.
  • Anonymous union. In the Vec3 class, it has the anonymous union type and the same time as anonymous union member of the class. This allows the Vec3 class user to access union members directly, because the union member of Vec3 does not have a name.

Combining the two steps above, now client of Vec3 can access through .x, .y and .z directly.

Theoretically this approach falls into the UB land, but due to the support of extension in GCC and Clang, this is actually safe to do.

Posted in C++ | Tagged: , , , , , , , , , | Leave a Comment »