While developing a SWIG wrapped C++ library for Ruby, we came across an unexplained crash during exception handling inside the C++ code.
I'm not sure of the specific circumstances to recreate the issue, but it happened first during a call to std::uncaught_exception
, then after a some code changes, moved to __cxa_allocate_exception
during exception construction. Neither GDB nor valgrind provided any insight into the cause of the crash.
I've found several references to similar problems, including:
- http://wiki.fifengine.de/Segfault_in_cxa_allocate_exception
- http://forums.fifengine.de/index.php?topic=30.0
- http://code.google.com/p/osgswig/issues/detail?id=17
- https://bugs.launchpad.net/ubuntu/+source/libavg/+bug/241808
The overriding theme seems to be a combination of circumstances:
- A C application is linked to more than one C++ library
- More than one version of libstdc++ was used during compilation
- Generally the second version of C++ used comes from a binary-only implementation of libGL
- The problem does not occur when linking your library with a C++ application, only with a C application
The "solution" is to explicitly link your library with libstdc++ and possibly also with libGL, forcing the order of linking.
After trying many combinations with my code, the only solution that I found that works is the LD_PRELOAD="libGL.so libstdc++.so.6" ruby scriptname
option. That is, none of the compile-time linking solutions made any difference.
My understanding of the issue is that the C++ runtime is not being properly initialized. By forcing the order of linking you bootstrap the initialization process and it works. The problem occurs only with C applications calling C++ libraries because the C ap开发者_如何学Goplication is not itself linking to libstdc++ and is not initializing the C++ runtime. Because using SWIG (or boost::python) is a common way of calling a C++ library from a C application, that is why SWIG often comes up when researching the problem.
Is anyone out there able to give more insight into this problem? Is there an actual solution or do only workarounds exist?
Thanks.
Following Michael Dorgan's suggestion, I'm copying my comment into an answer:
Found the real cause of the problem. Hopefully this will help someone else encountering this bug. You probably have some static data somewhere that is not being properly initialized. We did, and the solution was in boost-log for our code base. https://sourceforge.net/projects/boost-log/forums/forum/710022/topic/3706109. The real problem is the delay loaded library (plus statics), not the potentially multiple versions of C++ from different libraries. For more info: http://parashift.com/c++-faq-lite/ctors.html#faq-10.13
Since encountering this problem and its solution, I've learned that it's important to understand how statics are shared or not shared between your statically and dynamically linked libraries. On Windows this requires explicitly exporting the symbols for the shared statics (including things like singletons meant to be accessed across different libraries). The behavior is subtly different between each of the major platforms.
I recently ran into this problem as well. My process creates a shared object module that is used as a python C++ extension. A recent OS upgrade from RHEL 6.4 to 6.5 exposed the problem.
Following the tips here, I merely added -lstdc++ to my link switches and that solved the problem.
Having the same problem using SWIG for Python with a cpp library (Clipper), I found that using LD_PRELOAD as you suggested works for me too. As another workaround which doesn't require LD_PRELOAD, I found that I can also link the libstdc++ into the .so library file of my module, e.g.
ld -shared /usr/lib/i386-linux-gnu/libstdc++.so.6 module.o module_wrap.o -o _module.so
I can then import it in python without any further options.
I realise that @lefticus accepted the answer relating to what I guess amounts to undefined static init order; however, I had a very similar problem, this time with boost::python
.
I tried my damndest to find any static initilisation issues and couldn't - to the point that I refactored a major chunk of our codebase; and when that didn't work ended up removing exceptions altogether.
However, some more crept in and we started getting these segfaults again.
After some more investigation I came across this link which talks about custom allocators.
We do indeed use tcmalloc
ourselves; and after I removed it from our library which is exported to boost::python
we had no more issues!
So just an FYI to anyone who stumbles across this thread - if @lefticus's answer doesn't work, check if you're using a different allocator to that which python
uses.
精彩评论