I'm making a program (think: something like Launchy) that, more or less, goes through a bunch of strings and ranks them according to some criteria.
I store the results in a vector<SearchSuggestion>
, where the structure is currently defined as follows:
struct SearchSuggestion
{
std::string path;
int tag;
};
In my program, I copy the structures (and hence the string) around a lot, due to the need to manipulate lots of file paths and whatnot.
While this causes a noticeable but small delay in release mode, it dramatically slows down the debugging of my program (i.e., several-second pauses in between keystrokes). Looking for the cause, I see almost all of the time is spent with the following stack trace:
ntdll.dll!RtlCompareMemoryUlong()
ntdll.dll!RtlpAllocateHeap()
ntdll.dll!RtlAllocateHeap()
ntdll.dll!RtlDebugAllocateHeap()
ntdll.dll!string "Enabling heap debug options\n"()
ntdll.dll!RtlAllocateHeap()
msvcr90d.dll!_heap_alloc_base(unsigned __int64) C
msvcr90d.dll!_heap_alloc_dbg_impl(unsigned __int64, int, const char *, int, int *)
msvcr90d.dll!_nh_malloc_dbg_impl(unsigned __int64, int, int, const char *, int, int *)
msvcr90d.dll!_nh_malloc_dbg(unsigned __int64, int, int, const char *, int)
msvcr90d.dll!malloc(unsigned __int64)
msvcr90d.dll!operator new(unsigned __int64)
MyProgram.exe!std::_Allocate<wchar_t>(unsigned __int64, wchar_t *)
MyProgram.exe!std::allocator<wchar_t>::allocate(unsigned __int64)
MyProgram.exe!std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >::_Copy(unsigned __int64, unsigned __int64)
MyProgram.exe!std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >::_Grow(unsigned __int64, bool)
MyProgram.exe!std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >::assign(const std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > &, unsigned __int64, unsigned __int64)
MyProgram.exe!std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >(const std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > &)
MyProgram.exe!SearchSuggestion::SearchSuggestion(const SearchSuggestion &)
MyProgram.exe!std::_Construct<SearchSuggestion,SearchSuggestion>(SearchSuggestion *, const SearchSuggestion &)
MyProgram.exe!std::allocator<SearchSuggestion>::construct(SearchSuggestion *, const SearchSuggestion &)
MyProgram.exe!std::_Uninit_copy<SearchSuggestion * __ptr64,SearchSuggestion * __ptr64,std::allocator<SearchSuggestion> >(SearchSuggestion *, SearchSuggestion *, SearchSuggestion *, std::allocator<SearchSuggestion> &, std::_Nonscalar_ptr_iterator_tag, std::_Nonscalar_ptr_iterator_tag)
MyProgram.exe!stdext::unchecked_uninitialized_copy<SearchSuggestion * __ptr64,SearchSuggestion * __ptr64,std::allocator<SearchSuggestion> >(SearchSuggestion *, SearchSuggestion *, SearchSuggestion *, std::allocator<SearchSuggestion> &)
MyProgram.exe!std::_Uninit_move<SearchSuggestion * __ptr64,SearchSuggestion * __ptr64,std::allocator<SearchSuggestion>,std::_Undefined_move_tag>(SearchSuggestion *, SearchSuggestion *, SearchSuggestion *, std::allocator<SearchSuggestion> &, std::_Undefined_move_tag, std::_Undefined_move_tag)
MyProgram.exe!stdext::_Unchecked_uninitialized开发者_如何学JAVA_move<SearchSuggestion * __ptr64,SearchSuggestion * __ptr64,std::allocator<SearchSuggestion> >(SearchSuggestion *, SearchSuggestion *, SearchSuggestion *, std::allocator<SearchSuggestion> &)
MyProgram.exe!std::vector<SearchSuggestion,std::allocator<SearchSuggestion> >::_Umove<SearchSuggestion * __ptr64>(SearchSuggestion *, SearchSuggestion *, SearchSuggestion *)
MyProgram.exe!std::vector<SearchSuggestion,std::allocator<SearchSuggestion> >::_Insert_n(std::_Vector_const_iterator<SearchSuggestion,std::allocator<SearchSuggestion> > *, unsigned __int64, const SearchSuggestion &)
MyProgram.exe!std::vector<SearchSuggestion,std::allocator<SearchSuggestion> >::insert(std::_Vector_const_iterator<SearchSuggestion,std::allocator<SearchSuggestion> > *, const SearchSuggestion &)
MyProgram.exe!std::vector<SearchSuggestion,std::allocator<SearchSuggestion> >::push_back(const SearchSuggestion &)
MyProgram.exe!Appender<std::vector<SearchSuggestion,std::allocator<SearchSuggestion> > >(const wchar_t *, _tfinddata *, void *)
MyProgram.exe!EnumMatches(const std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > &, const std::vector<std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> >,std::allocator<std::basic_string<wchar_t,std::char_traits<wchar_t>,std::allocator<wchar_t> > > > &, int (const wchar_t *, _tfinddata *, void *)*, void *, int)
...
So it's pretty obvious that it's the copying of std::string
that's taking too long, probably due to poor locality of reference.
So now my question is simple:
How can I improve the performance of allocating a lot of small strings?
One way would be to stop putting them by value into the SearchSuggestion
structure. Instead give every SearchSuggestion
a handle to the std::string
which represents the path.
struct SearchSuggestion {
int pathId;
int tag;
};
This will make the copying within the vector more effecient as it will just be copying around simple int
s instead of complex std::string
values.
You can then use an std::map<int, std::string>
structure to map path id's into real path's.
You might try using Boost.Flyweight. I don't guarantee it'll work — the time saved from not copying the string might be spent checking if that string wasn't already stored — but you can give it a try.
Another option is to turn it into a boost::shared_ptr<std::string>
. There's only a pointer that will have to be copied now (so the cost will be practically null), but now there's an added cost when actually accessing that string (but this might not be that big of a problem).
It shouldn't be hard to try out these two and see which one produces better results.
The perfiest answer is: don't allocate and copy lots of strings. Use shared pointers to constant strings or symbols instead.
Malloc() and strcpy() are just slow, period, and copying a string is always going to be an O(n) operation. In realtime code it's better to just avoid allocations as much as possible.
Try boost::flyweight:
http://www.boost.org/doc/libs/1_40_0/libs/flyweight/doc/tutorial/basics.html
精彩评论