Calculate frequency of triples in a C array for indexing_问答_开发者

Calculate frequency of triples in a C array for indexing

开发者 https://www.devze.com 2023-03-14 13:10 出处：网络

I have an array of integers (possibly thousands), like int p[] = {0, 0, 0, 1, 0, 1, 2, 0, 2, 1, 0, 1, 0, 0, 0, 3, 0, 3, 5, 1, 7, ...

相关专题：c objective-c

I have an array of integers (possibly thousands), like

int p[] = {0, 0, 0, 1, 0, 1, 2, 0, 2, 1, 0, 1, 0, 0, 0, 3, 0, 3, 5, 1, 7, ...

from which I want to generate a set of indices for each unique triple; for the list above, something like:

0, 1, 2, 1, 0, 3, 4, ...

I've coded up a simple C++ implementation (though a plain C or Obj-C implementation would do just as well or better), but am sure there's room for improvement:

for (int i = 0; i < 24*3; i++) {
    std::ostringstream sstr;
    sstr << p[3*i] << "," << p[开发者_开发技巧3*i + 1] << "," << p[3*i + 2];
    freq[sstr.str()] += 1;
}

for (auto i = freq.begin(); i != freq.end(); i++) {
    std::cout << i->first << " => " << i->second << std::endl;
}

This just counts the frequencies of each triple, but could be trivially adapted to assign the desired indices. My question is, how can this be made more time/space efficient (bearing in mind that the runtime target is a mobile device)? Specifically,

1) What might be a better data structure than std::map for this purpose? I'd like to avoid introducing new dependencies (e.g. boost, unless it's header-only) 2) Is there a better key to use than a string? I thought about using a number for space efficiency, such as 5^a * 3^b * 2^c, but was concerned about exceeding numerical limits 3) Is there a better algorithm/approach than the one outlined here?

Agreed with Armen that it's generally OK. I would probably make a map with triples as keys and a set of indices as values:

typedef std::set<size_t> index_set;
typedef std::tuple<int,int,int> triple;
typedef std::map<triple, index_set> frequency_map;

And then:

const auto t = std::make_tuple(p[i], p[i+1], p[i+2]);
freqs[t].insert(i);

Then every i in freqs[t] is such that (p[i], p[i+1], p[i+2]) equals t.

The time complexity looks OK to me. using std::map looks OK to me. As for the key, it seems to me that a struct with 3 ints is more appropriate than a string. But I don't think it matters that much

I would definitely change the key to a simple 3 int struct; a tuple may also be a nice idea. This should cause a substantial performance improvement, because it would remove any potential heap allocation from strings and the overhead associated with the string stream.

Also, since you have thousands elements and no order constraints, an unordered_map may be a better container choice.