开发者

Sentence that uses every base64 character

开发者 https://www.devze.com 2023-01-06 10:21 出处:网络
I am trying to construct a sentence/letter combination that will return every base64 ch开发者_如何学Pythonaracter, but failing to find a word for purposes of unit testing.

I am trying to construct a sentence/letter combination that will return every base64 ch开发者_如何学Pythonaracter, but failing to find a word for purposes of unit testing.

The unit tests I have so far are failing to hit the lines that handle the + and / characters. While I can sling a them at the encoder/decoder directly it would be nice to have a human readable source (the base64 equivalent of 'the quick brown dog').


Here is a Base64 encoded test string that includes all 64 possible Base64 symbols:

char base64_encoded_test[] =
"U28/PHA+VGhpcyA0LCA1LCA2LCA3LCA4LCA5LCB6LCB7LCB8LCB9IHRlc3RzIEJhc2U2NCBlbmNv"
"ZGVyLiBTaG93IG1lOiBALCBBLCBCLCBDLCBELCBFLCBGLCBHLCBILCBJLCBKLCBLLCBMLCBNLCBO"
"LCBPLCBQLCBRLCBSLCBTLCBULCBVLCBWLCBXLCBYLCBZLCBaLCBbLCBcLCBdLCBeLCBfLCBgLCBh"
"LCBiLCBjLCBkLCBlLCBmLCBnLCBoLCBpLCBqLCBrLCBsLCBtLCBuLCBvLCBwLCBxLCByLCBzLg==";

char base64url_encoded_test[] =
"U28_PHA-VGhpcyA0LCA1LCA2LCA3LCA4LCA5LCB6LCB7LCB8LCB9IHRlc3RzIEJhc2U2NCBlbmNv"
"ZGVyLiBTaG93IG1lOiBALCBBLCBCLCBDLCBELCBFLCBGLCBHLCBILCBJLCBKLCBLLCBMLCBNLCBO"
"LCBPLCBQLCBRLCBSLCBTLCBULCBVLCBWLCBXLCBYLCBZLCBaLCBbLCBcLCBdLCBeLCBfLCBgLCBh"
"LCBiLCBjLCBkLCBlLCBmLCBnLCBoLCBpLCBqLCBrLCBsLCBtLCBuLCBvLCBwLCBxLCByLCBzLg==";

It decodes to a string composed entirely of relatively human-readable text:

char test_string[] = "So?<p>"
    "This 4, 5, 6, 7, 8, 9, z, {, |, } tests Base64 encoder. "
    "Show me: @, A, B, C, D, E, F, G, H, I, J, K, L, M, "
    "N, O, P, Q, R, S, T, U, V, W, X, Y, Z, [, \\, ], ^, _, `, "
    "a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s.";

This decoded string contains only letters in the limited range of isprint()'able 7-bit ASCII characters (space through '~').

Since I did it, I would argue that it is possible :-).


You probably can't do that.

/ in base64 encodes 111111 (6 '1' bits).

As all ASCII (which are the type-able and printable characters) are in the range of 0-127 (i.e. 00000000 and 01111111), the only ASCII character that could be encoded using '/' is the ASCII character with the code 127, which is the non-printable DEL character.

If you allow values higher than 127, you could have a printable but non-typeable string.


When attempting to encode/decode, this is the one place where I break the rule of unit testing a single method at once. You can have methods for encoding or decoding separately, but the only way to tell if you're doing it correctly is to use both encoding and decoding in a single assert. I would use the following psuedo code.

Generate a random string using Path.GetRandomFilename() this string is cryptographically strong
Pass the string to the encode method
Pass the output of the encode to the decode method
Assert.AreEqual(input from GetRandomFilename, output from Decode)

You can loop over this as many times as you want in order to say it's tested. You can also cover some specific cases; however, since encoding (sometimes) differs based on the positioning of the letters, you're better off going with a random string and just calling encode/decode about 50 or so times.

If you find that encoding/decoding fails in accepted scenarios, create unit tests for those and filter out the strings that contain those characters/character combinations. Also, document those failures in XMLDocs comments, code comments, and any documentation your app has.


What I came up with, may prove not unuseful. Needs to be entered exactly as is: I include a link to a screenshot showing all the usually invisible characters below, as well as the Base64 data string to which it converts, and a table of the relevant statistics pertinent to each of the 64 characters therein.


            <HTML><HEAD></HEAD><BODY><PRE>
            Did 

             THE    

              THE QUICK BROWN FOX   

               jump 

                over    

                 the    

                  lazy  

                   dogs 

                    or  

                     was    

                      he    

                       pushed   

                        ?   

            </PRE><B>hmm.</B></BODY><HTML>






            ÿß®Þ~c*¯/

This encodes to the Base64 string:

            PEhUTUw+PEhFQUQ+PC9IRUFEPjxCT0RZPjxQUkU+DQpEaWQJDQoNCiBUSEUJDQoNCiAgVEhFIFFVSUNLIEJST1dOIEZPWAkNCg0KICAganVtcAkNCg0KICAgIG92ZXIJDQoNCiAgICAgdGhlCQ0KDQogICAgICBsYXp5CQ0KDQogICAgICAgZG9ncwkNCg0KICAgICAgICBvcgkNCg0KICAgICAgICAgd2FzCQ0KDQogICAgICAgICAgaGUJDQoNCiAgICAgICAgICAgcHVzaGVkCQ0KDQogICAgICAgICAgICA/CQ0KDQo8L1BSRT48Qj5obW0uPC9CPjwvQk9EWT48SFRNTD4NCg0KDQoNCg0KDQoNCg//367efmMqry/==

which contains

            5--/'s
            4--+'s
            3--='s
            14--0's
            3--1's
            3--2's
            2--3's
            4--4's
            3--5's
            2--6's
            2--7's
            4--8's
            6--9's
            5--a's
            27--A's
            2--b's
            5--B's
            5--c's
            4--C's
            4--d's
            14--D's
            2--e's
            10--E's
            2--f's
            8--F's
            36--g's
            6--G's
            5--h's
            2--H's
            5--i's
            30--I's
            5--j's
            6--J's
            8--k's
            12--K's
            2--l's
            3--L's
            2--m's
            4--M's
            3--n's
            14--N's
            13--o's
            2--O's
            3--p's
            9--P's
            2--q's
            24--Q's
            2--r's
            5--R's
            2--s's
            6--S's
            2--t's
            7--T's
            2--u's
            1--U's
            3--v's
            6--V's
            4--w's
            5--W's
            3--x's
            6--X's
            2--y's
            4--Y's
            3--z's
            5--Z's
0

精彩评论

暂无评论...
验证码 换一张
取 消