Is it possible to "link" the STL to an assembly program, e.g. similar to linking the glibc to use functions like strlen, etc.? Specifically, I want to write a开发者_Python百科n assembly function which takes as an argument a std::vector and will be part of a lib. If this is possible, is there any documentation on this?
Any use of C++ templates will require the compiler to generate instantiations of those templates. So you don't really "link" something like the STL into a program; the compiler generates object code based upon your use of templates in the library.
However, if you can write some C++ code that forces the templates to be instantiated for whatever types and other arguments you need to use, then write some C-linkage functions to wrap the uses of those template instantiations, then you should be able to call those from your assembly code.
I strongly believe you're doing it wrong. Using assembler is not going to speed up your handling of the data. If you must use existing assembly code, simply pass raw buffers
std::vector
is by definition (in the standard) compatible with raw buffers (arrays); the standard mandates contiguous allocation. Only reallocation can invalidate the memory region that contains the element data. In short, if the C++ code can know the (max) capacity required and reserve()
/resize()
appropriately, you can pass &vector[0]
as the buffer address and be perfectly happy.
If the assembly code needs to decide how (much) to reallocate, let it use malloc. Once done, you should be able to use that array as STL container:
std::accumulate(buf, buf+n, 0, &dosomething);
Alternatively, you can use the fact that std::tr1::array<T, n>
or boost::array<T, n>
are POD, and use placement new right on the buffer allocated in the library (see here: placement new + array +alignment or How to make tr1::array allocate aligned memory?)
Side note
I have the suspicion that you are using assembly for the wrong reasons. Optimizing compilers will leverage the full potential of modern processors (including SIMD such as SSE1-4);
E.g. for gcc have a look at
__attibute__
(e.g. for pointer restrictions such as alignment and aliasing guarantees: this will enable the more powerful vectorization options for the compiler);-ftree_vectorize
and-ftree_vectorizer_verbose=2
,-march=native
Note also that since the compiler can't be sure what registers an external (or even inline) assembly procedure clobbers, it must assume all registers are clobbered leading to potential performance degradation. See http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html for ways to use inline assembly with proper hints to gcc.
probably completely off-topic: -fopenmp
and gnu::parallel
Bonus: the following references on (premature) optimization in assembly and c++ might come in handy:
- Optimizing software in C++: An optimization guide for Windows, Linux and Mac platforms
- Optimizing subroutines in assembly language: An optimization guide for x86 platforms
- The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers
And some other relevant resources
精彩评论