开发者

Methods for identifying encoding type using php

开发者 https://www.devze.com 2023-03-25 09:26 出处:网络
I have a PHP string type variable which may come encoded in Hexadecimal pattern or in Base64. For example:

I have a PHP string type variable which may come encoded in Hexadecimal pattern or in Base64.

For example:

737461636b6f766572666c6f772e636f6d
c3RhY2tvdmVyZmxvdy5jb20=

Both lines mean stackoverflow.com, the problem is I do not know which one is going to be HEX or Base64 because of that I do not know which decoding method to apply.

Is it possible to determine the encoding method without kn开发者_高级运维owing the encoded text? If yes, how to do it in php?


There is no way to know for sure whether the string is in Base64/HEX just by looking at it. You will have to include an additional bit with the string indicating which one it is, and then read that in your code and decode as required.

If, by chance the string contains a letter after 'F', you can be sure that it is Base64, but it may be Base64 even though it does not, so there is no way to be sure without some kind of header before the string telling you what the encoding is.


If you can guarantee only those two encodings the Base64 will end with an = and the Hex will only include [a-fA-F0-9].


This should not be too difficult. The valid set of characters for hex is [0-9a-f], while the valid set for Base64 is more like [a-zA-Z0-9\+/] possibly with one or two trailing = characters for padding. You should be able to use a regex to discriminate between one and the other.

Of course, there may be some instances where a string appears to be valid in both encodings, so there is no sure-fire way to test based just upon the string itself. Generally speaking, however, it would be fairly rare for a non-trivial input string encoded in Base64 to result in an output string that includes only valid hexadecimal characters and no padding characters. Fairly rare, but not impossible.

0

精彩评论

暂无评论...
验证码 换一张
取 消