What is the overhead in the string structur开发者_开发知识库e that causes sizeof() to be 32 ?
Most modern std::string
implementations1 save very small strings directly on the stack in a statically sized char
array instead of using dynamic heap storage. This is known as Small (or Short) String Optimisation (SSO). It allows implementations to avoid heap allocations for small string objects and improves locality of reference.
Furthermore, there will be a std::size_t
member to save the strings size and a pointer to the actual char
storage.
How this is specifically implemented differs but something along the following lines works:
template <typename T>
struct basic_string {
char* begin_;
size_t size_;
union {
size_t capacity_;
char sso_buffer[16];
};
};
On typical architectures where sizeof (void*)
= 8, this gives us a total size of 32 bytes.
1 The “big three” (GCC’s libstdc++ since version 5, Clang’s libc++ and MSVC’s implementation) all do it. Others may too.
std::string
typically contains a buffer for the "small string optimization" --- if the string is less than the buffer size then no heap allocation is required.
My guess is:
class vector
{
char type;
struct Heap
{
char* start;
char* end;
char* allocatedEnd;
};
struct Stack
{
char size;
char data[27];
}
union
{
Stack stackVersion;
Heap heapVersion;
} version;
};
But I bet there are hundreds of ways of doing it.
In g++5.2 (in e.g. g++4.9, it is different) a string is basically defined as :
class string {
char* bufferp;
size_t length;
union {
char local_buffer[16];
size_t capacity;
};
};
On an ordinary computer this adds up to 32 bytes (8+8+16).
The actual definition is of course
typedef basic_string<char> string;
but the idea is the same.
It is library dependent. You shouldn't rely on the size of std::string
objects because it is likely to change in different environments (obviously between different standard library vendors, but also between different versions of the same library).
Keep in mind that std::string
implementations are written by people who have optimized for a variety of use cases, typically leading to 2 internal representations, one for short strings (small internal buffer) and one for long strings (heap-allocated external buffer). The overhead is associated to holding both of these inside each std::string
object.
Q: Why is a dog yellow? A: It's not necessarily.
The size of a (an?) std::string object is implementation-dependent. I just checked MS VC++ 2010. It does indeed use 32 bytes for std::string. There is a 16 byte union that contains either the text of the string, if it will fit, or a pointer to heap storage for longer strings. If the implementers had chosen to keep 18 byte strings in the string object rather than on the heap, the size would be 34 bytes. The other 16 bytes comprise overhead, containing such things as the length of the string and the amount of memory currently allocated for the string.
A different implementation might always allocate memory from the heap. Such an implementation would undoubtedly require less memory for the string object.
精彩评论