Std::string implementation in GCC and its memory overhead for short strings?

Well, at least with GCC 4.4.5, which is what I have handy on this machine std::string is a typdef for std::basic_string.

Well, at least with GCC 4.4.5, which is what I have handy on this machine, std::string is a typdef for std::basic_string, and basic_string is defined in /usr/include/c++/4.4.5/bits/basic_string.h. There's a lot of indirection in that file, but what it comes down to is that nonempty std::strings store a pointer to one of these: struct _Rep_base { size_type _M_length; size_type _M_capacity; _Atomic_word _M_refcount; }; Followed in-memory by the actual string data. So std::string is going to have at least three words of overhead for each string, plus any overhead for having a higher capacity than `length (probably not, depending on how you construct your strings -- you can check by asking the capacity() method).

There's also going to be overhead from your memory allocator for doing lots of small allocations; I don't know what GCC uses for C++, but assuming it's similar to the dlmalloc allocator it uses for C, that could be at least two words per allocation, plus some space to align the size to a multiple of at least 8 bytes.

I'm going to guess you are on a 32 bit, 8 bit per byte platform. I'm also going to guess that at least on the gcc version you are using, that they are using a reference counted implementation for std::string. The 4 byte sizeof you see is a pointer to a structure containing the reference count and the string data (and any allocator state if applicable).

In this design of gcc's the only "short" string has size == 0, in which case it can share a representation with every other empty string. Otherwise you get a refcounted COW string. To investigate this yourself, code up an allocator that keeps track of how much memory it allocates and deallocates, and how many times.

Use this allocator to investigate the implementation of the container you're interested in.

If it's guaranteed that ">100,000 strings of 4-16 characters each", then don't use std::string. Instead, write your own ShortString class. It's interesting that "sizeof(std::string) == 4", how is that possible?

What are sizeof(char) and sizeof(void *)?

2 It's always possible. It could just hold a pointer to some kind of internal string structure, which holds all the usual data, like length and pointer to the string data. – jalf Feb 20 at 17:57 6 sizeof(char) is always 1.

– Cat Plus Plus Feb 20 at 17:57 @jalf If it just holds a pointer to an internal structure, then the actual memory cost is added by sizeof(void *) when the string is longer than sizeof(void *)-1. I could imagine some size optimization based on the pointer encoding is done for std::string when the size is small. – albert Feb 20 at 18:12 @albert: I don't understand what you mean.

But to get sizeof(std::string) == 4, you just need to be on a 32-bit system (where pointers are 4 bytes wide), and then store a single pointer, and no other members inside the string class. That pointer can just refer to a secondary data structure containing length/capacity and a pointer to the actual string buffer – jalf Feb 200 at 10:59.

I'm also going to guess that at least on the gcc version you are using, that they are using a reference counted implementation for std::string. The 4 byte sizeof you see is a pointer to a structure containing the reference count and the string data (and any allocator state if applicable). In this design of gcc's the only "short" string has size == 0, in which case it can share a representation with every other empty string.

Otherwise you get a refcounted COW string. To investigate this yourself, code up an allocator that keeps track of how much memory it allocates and deallocates, and how many times. Use this allocator to investigate the implementation of the container you're interested in.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions