For a few days we are dealing with very strange problem.
I can't understand how it even happens - when a third-party (MATLAB) program uses our shared library, it somehow overrides some of our symbols (boost, to be precise) with it's own. Those symbols are statically linked and (!!) local.
Here is the deal - we use boost 1.47, MATLAB has boost 1.40. Currently, library call segfaults on a call from OUR library to their boost (regex).
So, here is the magic:
- We have no library dependencies, ldd:
linux-vdso.so.1 => (0x00007fff4abff000) libpthread.so.0 => /lib/libpthread.so.0 (0x00007f1a3fd65000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f1a3fa51000) libm.so.6 => /lib/libm.so.6 (0x00007f1a3f7cd000) libgomp.so.1 => /usr/lib/libgomp.so.1 (0x00007f1a3f5bf000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f1a3f3a8000) libc.so.6 => /lib/libc.so.6 (0x00007f1a3f024000) /lib64/ld-linux-x86-64.so.2 (0x00007f1a414f9000) librt.so.1 => /lib/librt.so.1 (0x00007f1a3ee1c000)
- No Cxx symbols (our public symbols are POC C for binary compatibility) are exported from our library, nm:
nm -g --defined-only libmysharedlib.so addr1 T OurCSymbol1 addr2 T OurCSymbol2 addr3 T OurCSymbol3 ...
- Still, it uses their boost. HOW? Stacktrace (paths cut):
[ 0] 0x00007f21fddbb0a9 bin/libmwfl.so+00454825 fl::sysdep::linux::unwind_stack(void const**, unsigned long, unsigned long, fl::diag::thread_context co开发者_如何学JAVAnst&)+000009 [ 1] 0x00007f21fdd74111 bin/glnxa64/libmwfl.so+00164113 fl::diag::stacktrace_base::capture(fl::diag::thread_context const&, unsigned long)+000161 [ 2] 0x00007f21fdd7d42d bin/glnxa64/libmwfl.so+00201773 [ 3] 0x00007f21fdd7d6b4 bin/glnxa64/libmwfl.so+00202420 fl::diag::terminate_log(char const*, fl::diag::thread_context const&, bool)+000100 [ 4] 0x00007f21fce525a7 bin/glnxa64/libmwmcr.so+00365991 [ 5] 0x00007f21fb9eb8f0 lib/libpthread.so.0+00063728 [ 6] 0x00007f21f3e939a9 libboost_regex.so.1.40.0+00342441 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_all_states()+000073 [ 7] 0x00007f21f3eb6546 bin/glnxa64/libboost_regex.so.1.40.0+00484678 boost::re_detail::perl_matcher, std::allocator > >, boost::regex_traits > >::match_imp()+000758 [ 8] 0x00007f21c04ad595 lib/libmysharedlib.so+04855189 bool boost::regex_match, std::allocator > >, char, boost::regex_traits > >(__gnu_cxx::__normal_iterator, __gnu_cxx::__normal_iterator, boost::match_results, std::allocator > > >&, boost::basic_regex > > const&, boost::regex_constants::_match_flags)+000245 [ 9] 0x00007f21c04a71c7 lib/libmysharedlib.so+04829639 myfunc2()+000183 [ 10] 0x00007f21c01b41e3 lib/libmysharedlib.so+01737187 myfunc1()+000307
It's known, that MATLAB does dlopen with RTLD_NOW flag only.
People, think with me please. Now i'm desperate not to even fix this, but to simply understand ld&elf behavior.
edit: Small additional question: how i understood, without special linker options, symbols in linux .so libraries are never linked by address? So even statically linked local symbols are resolved in runtime?
Check out the -Bsymbolic
option for ld.
If -Bsymbolic
is specified, then at the time of creating a shared
object ld will attempt to bind references to global symbols to definitions
within the shared library. The default is to defer binding to runtime.
This may be clearer with an example.
Say example.o
contains a reference to a global function defined in
global.o
,
$ nm example.o | grep ' U'
U _GLOBAL_OFFSET_TABLE_
U globalfn
$ nm global.o | grep ' T'
00000000 T globalfn
and two shared objects, normal.so
and symbolic.so
, are built as
follows:
$ cc -fPIC -c example.c
$ cc -c global.c
$ rm -f archive.a; ar cr archive.a global.o
$ ld -shared -o normal.so example.o archive.a
$ ld -Bsymbolic -shared -o symbolic.so example.o archive.a
Disassembling the code for normal.so
shows that the call to
globalfn
is actually going through the procedure linkage table, and
thus the final destination of the call is determined at runtime.
$ objdump --disassemble normal.so
...snip...
00000194 <example>:
...snip...
1a6: e8 d9 ff ff ff call 184 <globalfn@plt>
...snip...
$ readelf -r normal.so
Relocation section '.rel.plt' at offset 0x16c contains 1 entries:
Offset Info Type Sym.Value Sym. Name
00001244 00000207 R_386_JUMP_SLOT 000001b8 globalfn
Whereas in symbolic.so
, the call always invokes the definition of
globalfn
within the shared object.
$ objdump --disassemble symbolic.so
...snip...
0000016c <shared>:
...snip...
17e: e8 0d 00 00 00 call 190 <globalfn>
...snip...
$ readelf -r symbolic.so
There are no relocations in this file.
Here is the deal - we use boost 1.47, MATLAB has boost 1.40. Currently, library call segfaults on a call from OUR library to their boost (regex).
You are invoking undefined behavior, which is a "Doctor, it hurts when I do this" kind of situation. The Matlab executable already contains external functions for class boost::re_detail::perl_matcher< elided >
. When Matlab loads your shared library the dynamic linker sees that your shared library defines those exact same symbols in a way that conflicts with the existing definitions. Undefined behavior.
The solution is to build a version of your library for use with Matlab that uses the same version of Boost as does Matlab.
精彩评论