Hex dump parsing in perl_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-10 12:37 出处：网络

I have a hex dump of a message in a file which i want to get it in an array so i can perform the decoding logic on it.

I was wondering if that was a easier way to parse a message which looks like this.

37 39 30 35 32 34 35 34 3B 32 31 36 39 33 34 35
3B 32 31 36 39 33 34 36 00 00 01 08 40 00 00 15
6C 71 34 34 73 69 6D 31 5F 33 30 33 31 00 00 00
00 00 01开发者_如何学编程 28 40 00 00 15 74 65 6C 63 6F 72 64 69
74 65 6C 63 6F 72 64 69

Note that the data can be max 16 bytes on any row. But any row can contain fewer bytes too (minimum :1 )

Is there a nice and elegant way rather than to read 2 chars at a time in perl ?

Perl has a hex operator that performs the decoding logic for you.

hex EXPR

hex

Interprets EXPR as a hex string and returns the corresponding value. (To convert strings that might start with either 0, 0x, or 0b, see oct.) If EXPR is omitted, uses $_.
print hex '0xAf'; # prints '175'
print hex 'aF'; # same

Remember that the default behavior of split chops up a string at whitespace separators, so for example

$ perl -le '$_ = "a b c"; print for split'
a
b
c

For every line of the input, separate it into hex values, convert the values to numbers, and push them onto an array for later processing.

#! /usr/bin/perl

use warnings;
use strict;

my @values;
while (<>) {
  push @values => map hex($_), split;
}

# for example
my $sum = 0;
$sum += $_ for @values;
print $sum, "\n";

Sample run:

$ ./sumhex mtanish-input 
4196

I would read a line at a time, strip the whitespace, and use pack 'H*' to convert it. It's hard to be more specific without knowing what kind of "decoding logic" you're trying to apply. For example, here's a version that converts each byte to decimal:

while (<>) {
  s/\s+//g;
  my @bytes = unpack('C*', pack('H*', $_));
  print "@bytes\n";
}

Output from your sample file:

55 57 48 53 50 52 53 52 59 50 49 54 57 51 52 53
59 50 49 54 57 51 52 54 0 0 1 8 64 0 0 21
108 113 52 52 115 105 109 49 95 51 48 51 49 0 0 0
0 0 1 40 64 0 0 21 116 101 108 99 111 114 100 105
116 101 108 99 111 114 100 105

I think reading in two characters at a time is the appropriate way to parse a stream whose logical tokens are two-character units.

Is there some reason you think that's ugly?

If you're trying to extract a particular sequence, you could do that with whitespace-insensitive regular expressions.