Algorithm Minimize playlist without altering the playout_问答_开发者

Im looking for an algorithm to reduce a list (playlist) of ordered but not unique items. Searched for set theory but havent found anything suitable yet

Examples

[a, b, b, c] -> [a, b, b, c] Cannot be reduced. 
[a, b, b, a, b, b] -> [a, b, b]. 
[b, b, b, b, b] -> [b]. 
[b, b, b, b, a] -> [b, b, b, b, a] Cannot be reduced.

Thinking of getting all existing sublists and count each instance. If there is such a sublist where the count tim开发者_C百科es the sublist length is equal to the orginal list, take the shortest sublist matching this criteria.

This seems a bit brute force, there must be a simpler/faster solution available.

For starters, you don't need to check all sublists -- just those with lengths that are factors of the length of the full list.

If your main concern is coding simplicity rather than raw speed, just let a regex engine solve the problem:

/^(.+?)\1+$/

Which is a variant on Abigail's awesome Perl regex to find prime numbers.

For each n <= N (where N is the length of the list), if n is a factor of N. If it is, then check if repeating the sublist of the first n characters generates the original list. If it does, then you've found a potential answer (the answer is the shortest). This should get you down to less than O(N^2) but still the same order of efficiency as brute force in the worst case.

You can do some pruning by noting that, if for example a sublist of length 2 successfully generates the first 4 characters but not the full list, then a sublist of length 4 will fail. You can keep a list of all such sublist lengths to not check and this will cut down on some computation.

Encode every set element with a prime number.

Ex:

a -> 2
b -> 3
c -> 5

etc.

Now, you need two more lists to maintain.

First list is for primes, second is for their exponents.

The idea is; when you stumble upon an element, record it's prime number and how many times in succession it appears.

For [a, b, b, c], you get this:

[2, 3, 3, 5]

Which can be recorded as:

[2, 3^2, 5]

or, more precisely:

[2^1, 3^2, 5^1]

and you maintain two lists:

[2,3,5]  // primes in succession - list [p]
[1,2,1]  // exponents - list [e]

Now, you iterate through these two lists from ends to middle, by checking if first element [p]^[e] is the same as last element; if it is, then second with next to last and so on... If all of them are the same, your list can be reduced.

In this example, you check if 2^1*5^1 == 3^2*3^2; and since it's not, it cannot be reduced.

Let's try [a, b, b, a, b, b]:

This is encoded as

[2^1, 3^2, 2^1, 3^2]

or,

[2, 3, 2, 3]  // primes
[1, 2, 1, 2]  // exponents

Now, we check if 2^1 * 3^2 == 3^2 * 2^1 (first prime, first exponent multiplied with last prime, last exponent, and then compared with second against next to last)

Since this holds, it is reducible.

Let's try [b, b, b, b, b]:

This can be encoded as

[3^5]

or,

[3]  // primes
[5]  // exponents

This is a special case: if you've got 1 element lists, then your original list is reducible.

Let's try [b, b, b, b, a]:

This can be encoded as

[3^4, 2^1]

or,

[3, 2]  // primes
[4, 1]  // exponents

We check if 3^4 == 2^1, and since it is not, your list is not reducible.

Let's try [a, b, a, b, a, b]:

This can be encoded as

[2^1, 3^1, 2^1, 3^1, 2^1, 3^1]

or,

[2, 3, 2, 3, 2, 3]
[1, 1, 1, 1, 1, 1]

Trying the above procedure works, because 2^1 * 3^1 == 3^1 * 2^1 == 2^1 * 3^1

So, the algorithm would be something like this:

Encode all numbers to primes.

Iterating through your list, make two lists and populate them as described

Now that you have your two lists, p and e, both of them having length n do this:

var start = p[0]^e[0] * p[n-1]^e[n-1]
var reducible = true;

for (int i = 0; i < n/2, ++i) :
   if ( (p[i]^e[i] * p[n-i]^e[n-i]) != start ) :
       reducible = false;
       break;

Note: I didn't really code this algorithm and try it out for various inputs. It's just an idea. Also, if a list is reducible, from it's length and length of n, it shouldn't be too hard to see how to reduce the original list to its basic form.

Second note: if anyone sees a mistake above, please correct me. It's possible that nothing of this really works since it's late and my concentration isn't optimal.

Here's some simple code that should run in close to linear time (at worst O(n lg lg n) I think, relying on some higher math).

f(x) {
  i = 1;
  while (i <= size(x) / 2) {
    if (size(x) % i != 0) { i++; continue;}
    b = true;
    for (j = 0; j + i < x.size(); j++) {
      if (x[i] != x[j]) {
        b = false;
        break;
      }
    }
    if (b) return i;
    i = max(i + 1, j / i * i / 2); // skip some values of i if j is large enough
  }
  return -1;
}

Essentially, the above performs the naive algorithm, but skips some periodicities which are known to be impossible due to earlier "near-misses". For example, if you try a period of 5 and see "aaaabaaaabaaaabaaaabab", you can safely skip 6, 7,..., 10 since we saw 4 cycles of 5 repeat and then a failure.

Ultimately, you end up doing a linear amount of work plus an amount of work that is linear in sigma(n), the sum of the divisors of n, which is bounded by O(n lg lg n).

*Note that proving the correctness of this skipping is pretty subtle, and I may have made a mistake on the details -- comments welcome.