开发者

MD5 hash of an ALAssetRepresentation image does not match a duplicated ALAssetRepresentation image's hash

开发者 https://www.devze.com 2023-04-06 04:30 出处:网络
I am using the following to create an NSData object from an ALAssetRepresentation to both export an image file as well as create an md5 hash from:

I am using the following to create an NSData object from an ALAssetRepresentation to both export an image file as well as create an md5 hash from:

- (NSUInteger)getBytes:(uint8_t *)buffer fromOffset:(long long)offset length:(NSUInteger)length error:(NSError **)error;

when I re-add the exported file and perform the same operation, the file's md5 hash is different.

When I create the NSData objects using UIImagePNGRepresentation() and perform the above operations, the md5 hashes match.

I am trying to avoid using UIImage开发者_高级运维PNGRepresentation() since it is considerably more expensive for what I am doing than the getsBytes method.

Any ideas would be appreciated!


The difference is that UIImagePNGRepresentation() returns only the image data and ignores the file headers.

The problem is that you're probably starting from offset 0. This will read the file headers which will mess up your hashing (since they might be the same images but have a different creation date).

Instead, here's an example of reading 1K from the middle of the file. For an image, this will only read about 340 pixels so you might want to increase the comparison size to about 20K or more if you're comparing images for duplicates for example.

The code would be as such:

    #import <CommonCrypto/CommonCrypto.h>
    #define HASH_DATA_SIZE  1024  // Read 1K of data to be hashed

    ...

    ALAssetRepresentation *rep = [anAsset defaultRepresentation];
    Byte *buffer = (Byte *) malloc(rep.size);
    long long offset = rep.size / 2; // begin from the middle of the file
    NSUInteger buffered = [rep getBytes:buffer fromOffset:offset length:HASH_DATA_SIZE error:nil];

    if (buffered > 0)
    {
        NSData *data = [NSData dataWithBytesNoCopy:buffer length:buffered freeWhenDone:YES]

        unsigned char result[CC_MD5_DIGEST_LENGTH];

        CC_MD5([data bytes], [data length], result);
        NSString *hash = [NSString stringWithFormat:
                        @"%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X",
                        result[0], result[1], result[2], result[3],
                        result[4], result[5], result[6], result[7],
                        result[8], result[9], result[10], result[11],
                        result[12], result[13], result[14], result[15]
                        ];

        NSLog(@"Hash for image is %@", hash);
    }

I tried this for about 4000 photos. The average hashing time for the full image when using UIImagePNGRepresentation() was 0.008 seconds where as it dropped to about 0.00008 seconds when comparing just 1K of each image read from the middle of the file.

0

精彩评论

暂无评论...
验证码 换一张
取 消