I get an input string with some data that's base64 encoded. Unfortunately, it gets random hexadecimal data (all lowercase) mixed it. It's fairly straightforward to sort out by hand because the hexadecimal data all seems to be in segments of 32 bytes. For example, I can format an example string like this:
6dd11d15c419ac219901f14bdd999f38 0ad94e978ad624d15189f5230e5435a9 2dc19fe95e583e7d593dd52ae7e68a6e 465ffa6074a371a8958dad3ad271181a 23310939b981b4e56f2ecee26f82ec60 fe04bef49be47603d1278cc80673b226 VGhpcyBpcyBzb 6dd11d15c419ac219901f14bdd999f38 0ad94e978ad624d15189f5230e5435a9 2dc19fe95e583e7d593dd52ae7e68a6e 465ffa6074a371a8958dad3ad271181a 23310939b981b4e56f2ecee26f82ec60 fe04bef49be47603d1278cc80673b226 6dd11d15c419ac219901f14bdd999f38 0ad94e978ad624d15189f5230e5435a9 2dc19fe95e583e7d593dd52ae7e68a6e 465ffa6074a371a8958dad3ad271181a 23310939b981b4e56f2ecee26f82ec60 fe04bef49be47603d1278cc80673b226 21lIGJhc2UtNjQ bb4af7e61760735ba17c29e8f542a668 开发者_运维百科 75da91e90863f1ddb7e149297fc59afc f5de951fb65d06d2927aab7b9b54830e 2d935616a54c381c2f38db3731d5a378 gZW5jb2RlZCB 6dd11d15c419ac219901f14bdd999f38 0ad94e978ad624d15189f5230e5435a9 2dc19fe95e583e7d593dd52ae7e68a6e 465ffa6074a371a8958dad3ad271181a 23310939b981b4e56f2ecee26f82ec60 fe04bef49be47603d1278cc80673b226 kYXRhIGhvb3JheSE=
Basically, I need to get the base64 stuff out and decode it (in PHP). The catch is that I get it all as one long string and it's not always immediately obvious where to put the linebreaks. For example, the first bit of base64 stuff ends in 'b', easily mistaken for some of the hex data. I'm at something of a loss for how to do this... Any ideas?
Thanks!
-malaI think this is an unanswerable problem -- it is entirely possible to have 32 bytes worth of base64-encoded data that cannot be differentiated from 32 bytes of random hex. Without more information about the stream it would be impossible to make a decision as to which bucket such data might go.
You could do it like:
read these 32 characters - if( preg_match(/[^a-f0-9]/) ) {
echo "this is a hex string";
} else {
$base64[] = preg_replace('/[a-f0-9]$/', '');
}
Of course, there's the issue of the trailing a-z/0-9, but it's a starting point. You could add some code in which counts from the end of your suspected base64 to the beginning of the next [g-zA-Z] and see if that number of characters is divisible by 32. If it is, then you probably found all of your original base64. If not, you won't have a clue if 'b' is the end of your b64, or the beginning of your hex, and 6 is the end of your hex, or beginning of your NEXT b64.
In short, this is stupid and it makes me sad.
There is the possibility that base64 decoding up to each decision point (next 32 bytes base64 or hex) might carry the clue.
There's also the most minute chance that interpreting one of those hex strings as base64 always yields easily detected garbage for whatever is being decoded.
Otherwise you're out of luck.
精彩评论