C++: High speed stack_问答_开发者_运维开发者技术经验分享

As far as I assume, std::stack and all such 'handmade' stacks work much slower than stack which is applications one.

Maybe there's a good low-level 'bicycle' already? (Stack realizati开发者_如何学Con).

Or it's a good idea to create new thread and use it's own stack?

And how can I work directly with application stack? (asm {} only?)

std::stack is a collection of c++ objects that have stack semantics. It has nothing to do with a thread's stack or the push and pop intructions in assembler code.

Which one are you trying to do

The 'assembler' stack is usually maintained by the hardware and required by various calling conventions, so you have no choice about how to 'allocate' it or 'manage' it. Some architectures have highly configurable stacks but you dont say what arch you are on

If you want a collection with stack semantics and you are writing in c++ then std::stack is your choice unless you can prove that its not fast enough

The only way in which std::stack is significantly slower than the processor stack is that it has to allocate memory from the free store. By default, it uses std::deque for storage, which allocates memory in chunks as needed. As long as you don't keep destroying and recreating the stack, it will keep that memory and not need to allocate more unless it grows bigger than before. So structure code like this:

std::stack<int> stack;
for (int i = 0; i < HUGE_NUMBER; ++i) 
    do_lots_of_work(stack); // uses stack

rather than:

for (int i = 0; i < HUGE_NUMBER; ++i)
    do_lots_of_work(); // creates its own stack

If, after profiling, you find that it's still spending too long allocating memory, then you could preallocate a large block so you only need a single allocation when your program starts up (assuming you can find an upper limit for the stack size). You need to get into the innards of the stack to do this, but it is possible by deriving your own stack type. Something like this (not tested):

class PreallocatedStack : public std::stack< int, std::vector<int> >
{
public:
    explicit PreallocatedStack(size_t size) { c.reserve(size); }
};

EDIT: this is quite a gruesome hack, but it is supported by the C++ Standard. More tasteful would be to initialise a stack with a reserved vector, at the cost of an extra allocation. And don't try to use this class polymorphically - STL containers aren't designed for that.

Using the processor stack won't be portable, and on some platforms might make it impossible to use local variables after pushing something - you might end up having to code everything in assembly. (That is an option, if you really need to count every last cycle and don't need portability, but make sure you use a profiler to check that it really is worthwhile). There's no way to use another thread's stack that will be faster than a stack container.

MInner, are you sure that stack operations are/can be bottlenecks of our application? if not, and i can bet for it, just use std::stack and forget about it.

The basic idea that a "handmade" stack is necessarily slower than the one used for function calls is fundamentally flawed. The two work sufficiently similarly that they will typically be close to the same speed. The biggest point that favors the hardware stack is that it's used often enough that the data at or close to the top of that stack will almost always be in the cache. Another stack that you create usually won't be used as often, so there's a much better chance that any given reference will end up going to main memory instead of the cache.

In the other direction, you have a bit more flexibility in allocating memory for your stack. You can create a specialized allocator just for your stack. When the hardware stack overflows, it normally allocates memory using the kernel allocator. The kernel allocator is usually tuned quite carefully, so it's usually pretty efficient -- but it's also extremely general purpose. It can't be written just to do stack allocation really well; it has to be written to do any kind of allocation at least reasonably well. In the process, its ability to do one thing exceptionally well often suffers a bit.

It's certainly possible to create a stack that's arbitrarily slow, but there's no fundamental reason that your stack can't be just as fast (or possibly even faster) than the one provided by the (usual) hardware. I'll repeat though: the single biggest reason for it to be slower is cache allocation, which simply reflects usage.

It depends on your requirement. If you want to push a userdefined data type on stack, you will need 'handmade' stacks

for others say, you want to push integers, chars, or pointers of objects you can use asm { push pop } but dont mess it up