Wednesday 2 March 2016

Ponder - C++ Reflection

Preamble

Some time ago I asked a Stack Overflow question: "How can I add reflection to a C++ application?" I posed this in a non-specific way as there are many ways to do this, depending on your application. As you can see, over time, many varied answers have appeared. Also, in the interim, C++11 has appeared, and reflection is still not a part of the C++ language specification; it is questionable it will ever be built in to the language for multiple reasons.

This reflection project is more of an itch that needs scratching than a concrete problem. I've written serialisation code in the past, and looked at other solutions, so this is to try and solve those problems and try to be a more general solution to introspection within an application.

The Problem

Another user posed the question "why does C++ not have reflection?" to which jalf writes this informative reply:
There are several problems with reflection in C++. 
  • It's a lot of work to add, and the C++ committee is fairly conservative, and don't spend time on radical new features unless they're sure it'll pay off. (A suggestion for adding a module system similar to .NET assemblies has been made, and while I think there's general consensus that it'd be nice to have, it's not their top priority at the moment, and has been pushed back until well after C++0x. The motivation for this feature is to get rid of the #include system, but it would also enable at least some metadata). 
  • You don't pay for what you don't use. That's one of the must basic design philosophies underlying C++. Why should my code carry around metadata if I may never need it? Moreover, the addition of metadata may inhibit the compiler from optimizing. Why should I pay that cost in my code if I may never need that metadata? 
  • Which leads us to another big point: C++ makes very few guarantees about the compiled code. The compiler is allowed to do pretty much anything it likes, as long as the resulting functionality is what is expected. For example, your classes aren't required to actually be there. The compiler can optimize them away, inline everything they do, and it frequently does just that, because even simple template code tends to create quite a few template instantiations. The C++ standard library relies on this aggressive optimization. Functors are only performant if the overhead of instantiating and destructing the object can be optimized away. operator[] on a vector is only comparable to raw array indexing in performance because the entire operator can be inlined and thus removed entirely from the compiled code. C# and Java make a lot of guarantees about the output of the compiler. If I define a class in C#, then that class will exist in the resulting assembly. Even if I never use it. Even if all calls to its member functions could be inlined. The class has to be there, so that reflection can find it. Part of this is alleviated by C# compiling to bytecode, which means that the JIT compiler can remove class definitions and inline functions if it likes, even if the initial C# compiler can't. In C++, you only have one compiler, and it has to output efficient code. If you were allowed to inspect the metadata of a C++ executable, you'd expect to see every class it defined, which means that the compiler would have to preserve all the defined classes, even if they're not necessary. 
  • And then there are templates. Templates in C++ are nothing like generics in other languages. Every template instantiation creates a new type. std::vector is a completely separate class from std::vector. That adds up to a lot of different types in a entire program. What should our reflection see? The template std::vector? But how can it, since that's a source-code construct, which has no meaning at runtime? It'd have to see the separate classes std::vector and std::vector. And std::vector::iterator and std::vector::iterator, same for const_iterator and so on. And once you step into template metaprogramming, you quickly end up instantiating hundreds of templates, all of which get inlined and removed again by the compiler. They have no meaning, except as part of a compile-time metaprogram. Should all these hundreds of classes be visible to reflection? They'd have to, because otherwise our reflection would be useless, if it doesn't even guarantee that the classes I defined will actually be there. And a side problem is that the template class doesn't exist until it is instantiated. Imagine a program which uses std::vector. Should our reflection system be able to see std::vector::iterator? On one hand, you'd certainly expect so. It's an important class, and it's defined in terms of std::vector, which does exist in the metadata. On the other hand, if the program never actually uses this iterator class template, its type will never have been instantiated, and so the compiler won't have generated the class in the first place. And it's too late to create it at runtime, since it requires access to the source code. 
  • And finally, reflection isn't quite as vital in C++ as it is in C#. The reason is again, template metaprogramming. It can't solve everything, but for many cases where you'd otherwise resort to reflection, it's possible to write a metaprogram which does the same thing at compile-time. boost::type_traits is a simple example. You want to know about type T? Check its type_traits. In C#, you'd have to fish around after its type using reflection. Reflection would still be useful for some things (the main use I can see, which metaprogramming can't easily replace, is for autogenerated serialization code), but it would carry some significant costs for C++, and it's just not necessary as often as it is in other languages.
This is an excellent summary of the problems with adding reflection to C++. In essence, C++ may be transformed significantly from the source to the compiled product. Also, we may not want reflected exactly what is in the source.

Current solutions

Some of the solutions I considered were:

Macros

C style macro solutions feature in the above Stack Overflow question. The solution offered here uses Boost. I wanted to avoid using macros as to do anything complicated they always end up getting complicated (e.g. list iteration). Also, with the new features in C++11, like variadic templates, it is possible to do something more elegant in C++. This also can make debugging easier, as finding errors in the middle of a nested macro, using complicated C++ can get hairy.

Qt

This is an excellent framework, and I generally enjoy using it for GUI work. It has a form of markup for performing reflection. However, it is not a general solution as you are tied to Qt, and its licensing model (GPL/LGPL).

Reflex

This is an interesting way of doing reflection: you get a compiler to generate the metadata for you, in this case, gcc-xml. Not a bad solution, and the generator does the leg-work for you. Have to be careful to keep the generated metadata up to date with the program. For sizeable applications, the metadata can get large, and time consuming to generate and parse. Licensing is LGPL.

Reflect

This uses macros (see above). Also overloads "RTTI", the compiler version of runtime introspection. MIT licence.

Classdesc

A mature solution, but one that consequently comes with some cruft. Several large applications depend on it so unlikely that it will change significantly. MIT licence.

CAMP

This is a nicely engineered library, aiming to be general purpose with solid cmake build system. It relies on Boost for type trait information and other utilities. Initially this was LGPL but then relaxed to MIT. Project now retired by authors.

Requirements

My requirements were:
  • Use C++11, due to better template and type support.
  • Avoid Boost. Great library, but leads to bloated compile times.
  • Avoid macros.
  • Liberal licence. LGPL impractical when require non-shared libraries, which is may be common when using tightly coupled information like reflection.

Ponder

I decided on CAMP as it fit my requirements the best. I forked it on Github and subsequently renamed it as CAMP has been retired. The new name is Ponder, i.e.
ponder, synonyms: reflect on
The Boost dependency has been removed for the reflection library, although Boost unit testing is still used. I tried to simplify the library as much as possible, using variadic templates to remove longhand template argument lists, using C++11 type traits, etc. Also added a Jekyll website on Github pages to support a project blog, documentation, and discussion (via Disqus).

The plan next would be to use the API to support its original aims. It can be...
used to expose and edit objects' attributes into a graphical user interface. It can also be used to do automatic binding of C++ classes to script languages such as Python or Lua. Another possible application would be the serialization of objects to XML, text or binary formats. Or you can even combine all these examples to provide a powerful and consistent interface for manipulating your objects outside C++ code
Related links:

[Jun-2016] Boost unit testing no longer used. Catch used instead.

No comments: