I have a sorted graph, when I load the edges I use a hashtable to look up the vertices. The edges are sorted by source, therefore I only need to look up "deeper" vertices. If a given edge has a source vertex on level n, then the sink vertex must be on level m, where m > n. I need to exploit this behaviour to improve performance.
The "ideal" naive solution would be a hashtable for each level, where I could 开发者_JAVA百科use the level to find the correct table, then find the element within the table. This would also allow me the extra benefit of being able to reclaim memory when n, the source level, is greater than the level. Unfortunately the graph is too big for this approach, 10^6 levels and 10^9 vertices.
Does anyone have any suggestion on what data structure I should be looking at? Gracias
Given your size estimates for the problem, I would suggest a vector of vectors: The outer vector contains one inner vector for each level, so it contains about 1 million entries; the inner vectors (which contain about 1000 entries each?) should be kept in sorted order and use sorted inserts using lower_bound()
etc.
You can reclaim memory by the copy/swap trick to replace old, unused inner vectors by empty ones.
typedef std::vector<Node> level_nodes;
typedef std::vector<level_nodes> graph_nodes;
graph_nodes g;
r.reserve(1000000); // OK, just a few MB
// ...
g[12].insert(std::lower_bound(g[12].begin(), g[12].end(), x), x);
level_nodes().swap(g[11]); // kill level 11 and reclaim memory
精彩评论