Goal
How to encode the data that describes how to re-order a static list from a one order to another order using the minimum amount of bytes possible?
Original Motivation
Originally this problem arose while working on a problem relaying sensor data using expensive satellite communication. A device had a list of about 1,000 sensors they were monitoring. The sensor list couldn't change. Each sensor had a unique id. All data was being logged internally for eventual analysis, the only thing that end users needed daily was which sensor fired in which order.
The entire project was scrapped, but this problem seems too interesting to ignore. Also previously I talked about "swaps" because I was thinking of the sorting algorithm, but really it's the overall order that's important, the steps required to arrive at that order probably wouldn't matter.
How the data was ordered
In SQL terms you could think of it like this.
**Initial Load**
create table sensor ( id int, last_detected datetime, other stuff )
-- fill table with ids of all sensors for this location
Day 0: Select ID from Sensor order by id
(initially data is sorted by the sensor.id because its a known value)
Day 1: Select ID from Sensor order by last_detected
Day 2: Select ID from Sensor order by last_detected
Day 3: Select ID from Sensor order by last_detected
Assumptions
- The starting list and ending list is composed of the exact same set of items
- Each sensor has a unique id (32 bit integer)
- The size of the list will be approximately 1,000 items
- Each sensors may fire multiple times per minute or not at all for days
- Only the change in ID sort order needs to be relayed.
- Computation resources for figuring optimal solutions is cheap / unlimited
- Data costs are expensive, roughly a dollar per kilobyte.
- Data could only be sent as whole byte (octet) increments
- The Day 0 order is known by the sender and receiver to start with
- For now assume the system functions perfectly and no error checking is required
As I said the project/hardware is no more so this is开发者_StackOverflow中文版 now just an academic problem.
The Challenge!
Define an Encoder
- Given A. Day N sort order
- Given B. Day N + 1 sort order
- Return C. a collection of bytes that describe how to convert A to B using the least number of bytes possible
Define a Decoder
- Given A. Day N sort order
- Given B. a collection of bytes
- Return C. Day N + 1 sort order
Have fun everyone.
As an academic problem, one approach would be to look at Algorithm P section 3.3.2 of Vol II of Knuth's the art of computer programming, which maps every permutation on N objects into an integer between 0 and N!-1. If every possible permutation is equally likely at any time, then the best you can do is to compute and transmit this (multi-precision) integer. In practice, giving each sensor a 10-bit number and then packing those 10 bit numbers up so you have e.g. 4 numbers packed into each chunk of 5 bytes would do almost as well.
Schemes based on diff or off the shelf compression make use of knowledge that not all permutations are equally likely. You may have knowledge of this based on the equipment, or you could see if this is case by looking at previous data. Fine if it works. In some cases with sensors and satellites you might want to worry about rare exceptions where you get worst case behaviour of your compression scheme and you suddenly have more data to transmit than you bargained for.
精彩评论