I need an encryption scheme where the plaintext and ciphertext are composed entirely of decimal digits.
In addition, the plaintext and ciphertext must be the same length.
Also the underlying encryption algorithm should be an industry-standard. I don't mind if its symmetric (e.g AES) or asymmetric (e.g RSA) - but it must be a recognized algorithm for which I can get a FIPS-140 approved library. (Otherwise it won't get past the security review stage).
Using AES OFB is fine for preserving the length of hex-based input (i.e. where each byte has 256 possible values: 0x00 --> 0xFF). However, this will not work for my means as plaintext and ciphertext must be entirely decimal.
NB: "Entirely decimal" may be interpreted two ways - both of which are acceptable for my requirements:
- Input & output bytes are characters '0' --> '9' (i.e. byte values: 0x30 -> 0x39)
- Input & output bytes have the 100 (decimal) values: 0x00 --> 0x99 (i.e. BCD)
Some more info: The max plaintext & ciphertext length is likely to be 10 decimal digits. (I.e. 10 bytes if using '0'-->'9' or 5 bytes if using BCD)
Consider following sample to see why AES fails: Input string is 8 digit number. Max 8-digit number is: 99999999 In hex this is: 0x5f5e0ff
This could be treated as 4 bytes: <0x05><0xf5><0xe0><0xff>
If I use AES OFB, I will get 4 byte output.
Highest possible 4-byte ciphertext output is <0xFF><0xFF><0xFF><0xFF>
Converting this back to an integer gives: 4294967295 I.e. a 10-digit number.
==> Two 开发者_StackOverflow社区digits too long.
One last thing - there is no limit on the length any keys / IVs required.
Use AES/OFB, or any other stream cipher. It will generate a keystream of pseudorandom bits. Normally, you would XOR these bits with the plaintext. Instead:
For every decimal digit in the plaintext
Repeat
Take 4 bits from the keystream
Until the bits form a number less than 10
Add this number to the plaintext digit, modulo 10
To decrypt, do the same but subtract instead in the last step.
I believe this should be as secure as using the stream cipher normally. If a sequence of numbers 0-15 is indistinguishable from random, the subsequence of only those of the numbers that are smaller than 10 should still be random. Using add/subtract instead of XOR should still produce random output if one of the inputs are random.
One potential candidate is the FFX encryption mode, which has recently been submitted to NIST.
Stream ciphers require a nonce for security; the same key stream state must never be re-used for different messages. That nonce adds to the effective ciphertext length.
A block cipher used in a streaming mode has essentially the same issue: a unique initialization vector must be included with the cipher text.
Many stream ciphers are also vulnerable to ciphertext manipulation, where flipping a bit in the ciphertext undetectably flips the corresponding bit in the plaintext.
If the numbers are randomly chosen, and each number is encrypted only once, and the numbers are shorter than the block size, ECB offers good security. Under those conditions, I'd recommend AES in ECB mode as the solution that minimizes ciphertext length while providing strong privacy and integrity protection.
If there is some other information in the context of the ciphertext that could be used as an initialization vector (or nonce), then this could work. This could be something explicit, like a transaction identifier during a purchase, or something implicit like the sequence number of a message (which could be used as the counter in CTR mode). I guess that VeriShield is doing something like this.
I am not a cipher guru, but an obvious question comes to mind: would you be allowed to use One Time Pad encryption? Then you can just include a large block of truly random bits in your decoding system, and use the random data to transform your decimal digits in a reversible way.
If this would be acceptable, we just need to figure out how the decoder knows where in the block of randomness to look to get the key to decode any particular message. If you can send a plaintext timestamp with the ciphertext, then it's easy: convert the timestamp into a number, say the number of seconds since an epoch date, modulus that number by the length of the randomness block, and you have an offset within the block.
With a large enough block of randomness, this should be uncrackable. You could have the random bits be themselves encrypted with strong encryption, such that the user must type in a long password to unlock the decoder; in this way, even if the decryption software was captured, it would still not be easy to break the system.
If you have any interest in this and would like me to expand further, let me know. I don't want to spend a lot of time on an answer that doesn't meet your needs at all.
EDIT: Okay, with the tiny shred of encouragement ("you might be on to something") I'm expanding my answer.
The idea is that you get a block of randomness. One easy way to do this is to just pull data out of the Linux /dev/random
device. Now, I'm going to assume that we have some way to find an index into this block of randomness for each message.
Index into the block of randomness and pull out ten bytes of data. Each byte is a number from 0 to 255. Add each of these numbers to the respective digit from the plaintext, modulo by 10, and you have the digits of the ciphertext. You can easily reverse this as long as you have the block of random data and the index: you get the random bits and subtract them from the cipher digits, modulo 10.
You can think of this as arranging the digits from 0 to 9 in a ring. Adding is counting clockwise around the ring, and subtracting is counting counter-clockwise. You can add or subtract any number and it will work. (My original version of this answer suggested using only 3 bits per digit. Not enough, as pointed out below by @Baffe Boyois. Thank you for this correction.)
If the plain text digit is 6, and the random number is 117, then: 6 + 117 == 123, modulo 10 == 3. 3 - 117 == -114, modulo 10 == 6.
As I said, the problem of finding the index is easy if you can use external plaintext information such as a timestamp. Even if your opponent knows you are using the timestamp to help decode messages, it does no good without the block of randomness.
The problem of finding the index is also easy if the message is always delivered; you can have an agreed-upon system of generating a series of indices, and say "This is the fourth message I have received, so I use the fourth index in the series." As a trivial example, if this is the fourth message received, you could agree to use an index value of 16 (4 for fourth message, times 4 bytes per one-time pad). But you could also use numbers from an approved pseudorandom number generator, initialized with an agreed constant value as a seed, and then you would get a somewhat unpredictable series of indexes within the block of randomness.
Depending on your needs, you could have a truly large chunk of random data (hundreds of megabytes or even more). If you use 10 bytes as a one-time pad, and you never use overlapping pads or reuse pads, then 1 megabyte of random data would yield over 100,000 one-time pads.
You could use the octal format, which uses digits 0-7, and three digits make up a byte. This isn't the most space-efficient solution, but it's quick and easy.
Example:
Text: Hello world!
Hexadecimal: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21
Octal: 110 145 154 154 157 040 167 157 162 154 144 041
(spaces added for clarity to separate bytes)
I don't believe your requirement can be met (at all easily anyway), though it's possible to get pretty close close.
AES (like most encryption algorithms) is written to work with octets (i.e. 8-bit bytes), and it's going to produce 8-bit bytes. Once it's done its thing, converting the result to use only decimal digits or BCD values won't be possible. Therefore, your only choice is to convert the input from decimal or BCD digits to something that fills an octet as completely as possible. You can then encrypt that, and finally re-encode the output to use only decimal or BCD digits.
When you convert the ASCII digits to fill the octets, it'll "compress" the input somewhat. The encryption will then produce the same size of output as the input. You'll then encode that to use only decimal digits, which will expand it back to roughly the original size.
The problem is that neither 10 nor 100 is a number that you're going to easily fit exactly into a byte. Numbers from 1 to 100 can be encoded in 7 bits. So, you'll basically treat those as a bit-stream, putting them in 7 bits at a time, but taking them out 8 bits at a time to get bytes to encrypt.
That uses the space somewhat better, but it's still not perfect. 7 bits can encode values from 0 to 127, not just 0 to 99, so even though you'll use all 8 bits, you won't use every possible combination of those 8 bits. Likewise, in the result, one byte will turn into three decimal digits (0 to 255), which clearly wastes some space. As a result, your output will be slightly larger than your input.
To get closer than that, you could compress your input with something like Huffman or an LZ* compression (or both) before encrypting it. Then you'd do roughly the same thing: encrypt the bytes, and encode the bytes using values from 0 to 9 or 0 to 99. This will give better usage of the bits in the bytes you encrypt, so you'd waste very little space in that transformation, but does nothing to improve the encoding on the output side.
For those doubting FFX mode AES, please feel free to contact me for further details. Our implementation is a mode of AES that effectively sits on top of existing ciphers. the specification with proof/validation is up on the NIST modes website. FFSEM mode AES is included under FFX mode.
http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/ffx/ffx-spec.pdf
If its meaningful, you can also have a conversation with NIST directly about their status in respect to modes submission/AES modes acceptance to address your FIPS question. FFX has security proofs, independent cryptographic review and is not a "new cipher". It is however based on methods that go back 20+ years - proven techniques. In implementation we ability to encrypt data whilst preserving length, structure, integrity, type and format. For example specify an explicit format policy that the output will be NNN-NN-NNNN.
So, as a mode of AES we can for example on a mainframe environment for implementation we simple use the native AES processor on a z10. Similar on open systems with HSM devices- we can sit on top of an existing AES implementation.
Format Preserving Encryption (as its often referred to) in this way is already being used in industry and available in off-the-shelf products and rather quick to deploy - already used in POS devices etc, Payments systems, Enterprise deployments etc.
Mark Bower VP Product Management Voltage Security Drop a note to info@voltage.com for more info or take a look at our website for more info.
Something like a Feistel cipher should fit your requirements. Split your input number into two parts (say 8 digits each), pass one part through a not-necessarily-reversible-or-bijective function, and subtract the other part from the result of that function (modulo e.g. 100,000,000). Then rearrange the digits somehow and repeat a bunch of times. Ideally, one should slightly vary the function which is used each time. Decryption is similar to encryption except that one starts by undoing the last rearrangement step, then subtracts the second part of the message from the result of using the last function one used on the first part of the message (again, modulo 100,000,000), then undoes the previous rearrangement step, etc.
The biggest difficulties with a Feistel cipher are finding a function which achieve good encryption with a reasonable number of rounds, and figuring out how many rounds are required to achieve good encryption. If speed is not important, one could probably use something like AES to perform the scrambling function (since it doesn't have to be bijective, you could arbitrarily pad the data before each AES step, and interpret the result as a big binary number modulo 100,000,000). As for the number of rounds, 10 is probably too few, and 1000 is probably excessive. I don't know what value between there would be best.
Using only 10 digits as input/output is completely insecure. It is so insecure, that in very likely that will be cracked in real application, so consider using at least 39 digits (128 bits equivalent) If you are going to use only 10 digits there is no point in using AES in this case you have chance to invent your own (insecure) algorithm.
The only way you might get out of this is using STREAM cipher. Use 256 bit key "SecureKey" and initialisation vector IV which should be different at each beginning of starting season. Translate this number into 77 digit (decimal) number and use "addition whit overflow" over modulo 10 whit each digit.
For instance
AES(IV,KEY) = 4534670 //and lot more
secret_message = 01235
+ and mod 10
---------------------------------------------
ciphertext = 46571 // and you still have 70 for next message
when you run out for digits from stream cipher -> AES(IV,KEY) increase IV and repeat IV=IV+1
Keep in mind that you should absolutely never use same IV twice, so you should have some scheme over this to prevent this.
Another concern is also in generating Streams. If you generate number that is higher than 10^77 you should discard that number increase IV and try again whit new IV. Other way there is high probability that you will have biased numbers and vulnerability.
There also very likely that there is flaw in this scheme or there will be in your implementation
精彩评论