Template instantations are still killing me (part 2)

goto part1 goto part3

May 4, 2017

I emailed the Boost users group about my template woes. Namely, I asked the question about where to put any extern templates I want. The answer is clearly to put them in my own header file, not in any other place. Certainly, I don't have to add them to any Boost.Multiprecision file, that would be wrong.

So, I put in the line extern template class boost::multiprecision::number<boost::multiprecision::mpfr_float_backend<0>, boost::multiprecision::et_on>; to my mpfr_extensions.hpp file, and added template class boost::multiprecision::number<boost::multiprecision::mpfr_float_backend<0>, boost::multiprecision::et_on>; to my corresponding .cpp file. However, compilation fails because explicit instantiation instantiates the entire thing not just the parts you actually use, as happens with regular template usage. So, either I have to remove some static asserts to allow some nonsense functions, which are supposed to only be instantiated for integral backends, to be instantiated for this floating-point type, or give up. No, I'm not giving up.

I commented out the offending static asserts, and the code compiled. Yay! Here are comparitive compile times using the Clang that comes with XCode 8:

  • without extern: 1m12s for the core, 54 seconds for the endgames. 7 threads.
  • with extern: 1m9s for the core, 56 for the endgames.

Underwhelming performance improvement. What happened?

Well, compilation under templight in my first foray took forever. Let's try to find out why.

Compilation during my first round of testing with Templight took forever. Dang, that was slow. This is because the default mode for Clang is DEBUG, not RELEASE. So, I recompiled in release mode (cmake -DCMAKE_BUILD_TYPE=RELEASE ../llvm/), re-installed, and re-compiled. It made a huge difference in compile time.

The first thing I note is that the templates taking the most time to compile moved around from Debug to Release, not surprisingly. Here's a screencap of the debug on the left, release on the right:

such numbers
wow

So, the top number instantiations are taking ~12% of time for this one source file in debug, and 15% in release.

I think I just realized that the number in Called is the number of times the template instantiation is referred to, not the number of times it is instantiated. Oops, I'm gonna have to write Boost about that. My bad, y'all.

So let's focus on the above tables. The export call for System is taking over 1/3 of time for this source file. Perhaps I can get rid of that?

It looks like here the thing to do is to call register_type(), rather than BOOST_CLASS_EXPORT, so that only types which are serialized are registered. I think this moves the time to compile into runtime to register... Hmm. Is this worth it?

amp_cauchy_debug
i

Now let's compile with the explicit instantiation (with static asserts commented out), and compare. This is release-to-release comparison. Optimization level 0 in Clang in both.

no_extern

not using extern

yep_extern

yep, using extern

Hmm, not a big difference. I believe the timing difference can be chalked up to the OS. So, no conclusions yet. I do believe that the explicit instantiation of the number class with backend mpfr should have more or less eliminated that category from the chart, but it didn't... So something I did was wrong, or I just don't understand what I am doing here yet.

Looking at the data from having compiled with these dubiously externed numbers, I feel like externing some Eigen types would be really helpful, too. I mean, the number of times that the number is being referred to is huge. And in so many translation units!

So, I tried externing Eigen::Matrix<bertini::mpfr_float, Eigen::Dynamic, Eigen::Dynamic> and similarly for double, etc. No dice. Here are the error messages I got:

There are two of them, and two per type, for a total of 8. Bummer.

Commenting out these static asserts again allows compilation. Let's see if it made a difference:

so much brown

That'd be a no. It improved nothing. Same number of calls to the same templates. What am I doing wrong?

Conclusions

I still don't fully understand either the correct way to use extern, or the way to read these data from Templight. I'll keep working at it.