Behind the magic of magic_enum

Recently, a coworker pointed me towards a C++17 library to convert enumeration values to strings and vice versa. The library called magic_enum (https://github.com/Neargye/magic_enum) and it indeed feels like magic. I was immediately curious: how did they pull this off?

The code I am presenting here can be seen as a massively down-scaled version of magic_enum: the approach taken is exactly the same. I’ve mainly tried to simplify for purposes of readability and understandability. Thus, consider the code presented here licensed using MIT, the same license which governs magic_enum – just will less features and more comments. You can find my re-implementation at https://gist.github.com/zhmu/9ac375706ffbafa5d24693f8475abd79.

I would like to thank Daniil Goncharov for giving me something to study and blowing my mind!

Goal

We want the following to code to yield the value as commented:

enum class Colour {
    Red = 27,
    Green,
    Blue = 40,
};

const std::string_view s = enum_to_string(Colour::Green);
puts(s.data()); // Green

This code already has some requirements subtly embedded:

  • enum_to_string() must return owned memory, as it returns a std::string_view
  • The returned string_view must be zero-terminated for puts() to work

This means we need to provide static storage for the string values. We could do without, but it turns out the code optimizes better and is fairly easy to implement so I’ve decided to roll with it.

First steps: how would you convert a given enum value to a string?

In classic C, you have __FILE__ and __LINE__, which expand to character arrays containing the filename and current line number of the input you are compiling. As time went on, C99 introduced __func__, which contains the function you are in. As this is C, function overloading does not exist and no parameters are necessary.

C++ does not provide a standard way to determine the current function name with all parameters involved, but GCC/Clang provides __PRETTY_FUNCTION__ and Microsoft Visual Studio provides __FUNCSIG__. Let’s show some examples:

void fun1() { puts(__PRETTY_FUNCTION__); }
int fun2(int v) { puts(__PRETTY_FUNCTION__); return 0; }

fun1();
// void fun1()
fun2(123);
// int fun2(int)

This also works with template functions:

template<typename T> void fun3() { puts(__PRETTY_FUNCTION__); }

fun3<int>();
// void fun3() [with T = int]
fun3<Colour>();
// void fun3() [with T = Colour]

The clever insight here is that you can create a template that takes an enumeration and a compile-time known value of that enumeration:

template<typename E, E v> void fun4() { puts(__PRETTY_FUNCTION__); }

fun4<Colour, Colour::Red>();
// void fun4() [with E = Colour; E v = Colour::Red]

Hence, if we’d perform a compile-time instantiation of all possible enumeration values, we have their corresponding character representation!

How do you know which enumeration values are possible?

First, let’s see what happens if you try to use fun4() on a value that isn’t in the enumeration:

fun4<Colour, static_cast<Colour>(1)>();
// void fun4() [with E = Colour; E v = (Color)1 ]

This value is distinct in a sense that it doesn’t correspond with the Color::... output we saw previously. This means we can determine whether any integer corresponds to a given enumeration value or not!

So how we do obtain a list of all valid enumeration values? We try them all, one by one. If you look at the magic_enum documentation, it stands out that the enum value must reside within a certain range, which by default is MAGIC_ENUM_RANGE_MIN .. MAGIC_ENUM_RANGE_MAX. Only this range will be evaluated by default.

In magic_enum, the function n() is responsible for the conversion. It uses pretty_name() to normalize the resulting compiler-specific result of __PRETTY_FUNCTION__ / __FUNCSIG__ to the enumeration value name, or an empty string_view in case the value does not exist within the enumeration.

A incomplete implementation of pretty_name() / n() function, which works well enough for enumeration values that do not contain digits, is as follows:

constexpr auto is_pretty(char ch) noexcept
{
    return (ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z');
}

constexpr auto pretty_name(std::string_view sv) noexcept
{
    for(std::size_t n = sv.size() - 1; n > 0; --n) {
        if (!is_pretty(sv[n])) {
            sv.remove_prefix(n + 1);
            break;
        }
    }
    return sv;
}

template<typename E, E V>
constexpr auto n() noexcept
{
#if defined(__GNUC__) || defined(__clang__)
    return pretty_name({ __PRETTY_FUNCTION__, sizeof(__PRETTY_FUNCTION__) - 2 });
#elif defined(_MSC_VER)
    return pretty_name({ __FUNCSIG__, sizeof(__FUNCSIG__) - 17 });
#endif
}

The implementation in magic_enum supports more cases and will reject invalid names. For our purposes, the implementation above suffices.

Collecting the possible enumeration values

Now to we have a way to query enumeration values one by one, we need a way to assemble them together into an array. We want the following code to compile and yield the output in comments:

auto f = values<Colour>();
for(const auto& i: f)
    std::cout << static_cast<int>(i) << '\n'
// 27
// 28
// 40

First things first, we need a function to determine whether a given enumeration value is valid. Remember the magic n() function above? All we need to check is whether the value returned is non-empty:

template<typename E, E V>
constexpr auto is_valid()
{
    constexpr E v = static_cast<E>(V);
    return !n<E, V>().empty();
}

We also need a way to convert an integer v to the v-th enumeration value. Within magic_enum, this function is called ualue(), so I’ll stick with that. Thankfully, this is pretty straight forward:

template<typename E>
constexpr auto ualue(std::size_t v)
{
    return static_cast<E>(ENUM_MIN_VALUE + v);
}

Now things get more tricky. We want to create a compile-list sequence of all integers it needs to try (this is simply the list of integers between MAGIC_ENUM_RANGE_MIN and MAGIC_ENUM_RANGE_MAX. We know there are ENUM_MAX_VALUE - ENUM_MIN_VALUE + 1 such values. Using std::make_index_sequence<>, we can generate a compile-time list containing all std::size_t values from 0 up to and including ENUM_MAX_VALUE - ENUM_MIN_VALUE.

This allows us to write the values<E>() function, which generates the appropriate list and feeds it into a helper function:

template<typename E>
constexpr auto values() noexcept
{
    constexpr auto enum_size = ENUM_MAX_VALUE - ENUM_MIN_VALUE + 1;
    return values<E>(std::make_index_sequence<enum_size>({}));
}

Let’s give the function prototype of the values() helper function:

template<typename E, std::size_t... I>
constexpr auto values(std::index_sequence<I...>) noexcept;

Here, I is a sequence of std:size_t‘s. There can be zero up to a lot of them, and each one corresponds with an enumeration value we want to try to see if it is valid. C++17 gives us fold expressions, which allow us to conveniently express this using the is_valid() and ualue() functions:

constexpr bool valid[sizeof...(I)] = { is_valid<E, ualue<E>(I)>()... };

Our Colours enumeration has only three values, which means most of valid will be false. We want to condense it to a std::array<E, N> which contains only the values present in the enumeration. The first step is to introduce a helper function to count the number of items in valid that are true:

template<std::size_t N>
constexpr auto count_values(const bool (&valid)[N])
{
    // Cannot use std::count_if(), it is not constexpr pre C++20
    std::size_t count = 0;
    for(std::size_t n = 0; n < N; ++n)
        if (valid[n]) ++count;
    return count;
}

Which makes the remainder of the values() function pretty straight-forward:

constexpr auto num_valid = count_values(valid);
static_assert(num_valid > 0, "no support for empty enums");

std::array<E, num_valid> values = {};
for(std::size_t offset = 0, n = 0; n < num_valid; ++offset) {
    if (valid[offset]) {
        values[n] = ualue<E>(offset);
        ++n;
    }
}

return values;

We’ll be needing values<E>() quite a bit. We’ll introduce values_v as a shorthand:

template<typename E>
inline constexpr auto values_v = values<E>();

We can test our implementation by iterating over all values_v<>. It indeed yields all value enumeration values of Colour, exactly as intended:

auto f = values_v<Colour>;
for(const auto& i: f)
     std::cout << static_cast<int>(i) << '\n'
// 27
// 28
// 40

From values to entries

Our intention is to implement an entries_v<E> variable, which yields a std::array<std::pair<E, string_view>, ...>: that is, for a given enum E, it yields an array containing tuples with each valid value within that enum and its corresponding string representation.

First, we’ll introduce another helper, enum_value_v<V, E> to obtain the string representation of a enumeration value V within enum E. For now, this will simply be a call to the n() function:

template<typename E, E V>
constexpr auto enum_name()
{
    constexpr auto name = n<E, V>();
    return name;
}

template<typename E, E V>
inline constexpr auto enum_name_v = enum_name<E, V>();

We can then introduce the entries() function as follows: given a sequence of all possible enumeration values I, we yield a std::array<> with the enumeration value and the corresponding enumerating name string. Fold expressions make this convenient:

template<typename E, std::size_t... I>
constexpr auto entries(std::index_sequence<I...>) noexcept
{
    return std::array<std::pair<E, std::string_view>, sizeof...(I)>{
        {{ values_v<E>[I], enum_name_v<E, values_v<E>[I]>}...}
    };
}

This allows us to finally express entries_v<E> as follows:

template<typename E>
inline constexpr auto entries_v =
    entries<E>(std::make_index_sequence< values_v<E>.size()>());

We can prove that this works by printing the contents of entries_v<Colour>:

auto q = entries_v<Colour>;
for(const auto [a, b]: q)
    std::cout << static_cast<int>(a) << ' ' << b << '\n'
// 27 Red
// 28 Green
// 40 Blue

Awesome!

Implementing enum_to_string()

Given all the work we did previously, enum_to_string() is trivial:

template<typename E>
constexpr std::string_view enum_to_string(E value)
{
    for (const auto& [ key, name ]: entries_v<E>) {
        if (value == key) return name;
    }
    return {};
}

Of course, C++20’s constexpr algorithms would make this a lot nicer.

Static string storage and zero-termination

This implementation has a major drawback and an annoying bug.

As for the drawback, the complete strings as output by __PRETTY_FUNCTION__ will be stored in the executable (this can be seen using tools like Compiler Explorer and examining the assembly output). We have:

.LC0:
        .string "constexpr auto n() [with E = Colour; E V = Colour::Green]"
main:
        sub     rsp, 8
        mov     edi, OFFSET FLAT:.LC0+51
        call    puts
        xor     eax, eax
        add     rsp, 8
        ret

This prints Green] – which illustrates the bug: if we use std::cout instead, we’d get the correct string. This is because we do not properly insert a \0-character – hence, we’ll just end up with whatever was in memory.

We can introduce a helper class to store a compile-time zero-terminated string. magic_enum calls static_string, which I’ll also do. The idea is to provide N + 1 bytes of storage, which are initially zero. In the constructor, we’ll copy the bytes of the string_view:


template<std::size_t N>
struct static_string
{
    constexpr static_string(std::string_view sv) noexcept
    {
        // std::copy() is not constexpr in C++17, hence...
        for(std::size_t n = 0; n < N; ++n)
            content[n] = sv[n];
    }
    constexpr operator std::string_view() const noexcept
    { return { content.data(), N }; }
    
private:
    std::array<char, N + 1> content{};
};

All that remains is to use static_string<> in enum_name(), as follows:

template<typename E, E V>
constexpr auto enum_name()
{
    constexpr auto name = n<E, V>();
    return static_string<name.size()>(name);
}

Which yields the desired assembly output:

main:
        sub     rsp, 8
        mov     edi, OFFSET FLAT:enum_name_v<Colour, (Colour)28>
        call    puts
        xor     eax, eax
        add     rsp, 8
        ret

enum_name_v<Colour, (Colour)28>:
        .byte   71       // G
        .byte   114      // r
        .byte   101      // e
        .byte   101      // e
        .byte   110      // n
        .zero   1

Closing words

I’m extremely grateful for Daniil Goncharov’s work. Initially, I would use the tried and proven C macro-style approach, which feels clumsy given the state C++ is in these days. Studying his approach has taught me some wonderful things, and I hope I’ve share some of them by writing this post.

This entry was posted in Programming and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *