Does anyone know how to get grep, or similar tool, to retrieve offsets of hex strings in a file?
I have a bunch of hexdumps (from GDB) that I need to check for strings and then run again and check if the value has changed.
I have tried hexdump
and dd
, but the problem is because it's a stream, I lose my offset for the files.
Someone must have had this problem and a workaround. What can I do?
To clarify:
- I have a series of dumped memory regions from GDB (typically several hundred MB)
- I am trying to narrow down a number by searching for all the places the number is stored, then doing it again and checking if the new value is stored at the same memory location.
- I cannot get
grep
to do anything because I am looking for hex values so all the times I have tried (like a bazillion, roughly) it will not give me the correct output. - The hex dumps are just complete binary files, the paterns are within float values at larges so 8? bytes?
- The patterns are not line-wrapping, as far as I am aware. I am aware of the what it changes to, and I can do the same process and compare the lists to see which match.
Perl COULD be a option, but at this point, I would assume my lack of knowledge with bash and its tools is the main culprit.
Desired output format
It's a little hard to explain the output I am getting since I really am not getting any output.
I am anticipating (and expecting) something along the lines of:
<offset>:<searched value>
Which is the pretty we开发者_如何学运维ll standard output I would normally get with grep -URbFo <searchterm> . > <output>
What I tried:
A. Problem is, when I try to search for hex values, I get the problem of if just not searching for the hex values, so if I search for 00 I should get like a million hits, because thats always the blankspace, but instead its searching for 00 as text, so in hex, 3030. Any idea's?
B. I CAN force it through hexdump or something of the link but because its a stream it will not give me the offsets and filename that it found a match in.
C. Using grep -b
option doesnt seem to work either, I did try all the flags that seemed useful to my situation, and nothing worked.
D. Using xxd -u /usr/bin/xxd
as an example I get a output that would be useful, but I cannot use that for searching..
0004760: 73CC 6446 161E 266A 3140 5E79 4D37 FDC6 s.dF..&j1@^yM7..
0004770: BF04 0E34 A44E 5BE7 229F 9EEF 5F4F DFFA ...4.N[."..._O..
0004780: FADE 0C01 0000 000C 0000 0000 0000 0000 ................
Nice output, just what I want to see, but it just doesn't work for me in this situation..
E. Here are some of the things I've tried since posting this:
xxd -u /usr/bin/xxd | grep 'DF'
00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003 @.........S.....
root# grep -ibH "df" /usr/bin/xxd
Binary file /usr/bin/xxd matches
xxd -u /usr/bin/xxd | grep -H 'DF'
(standard input):00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003 @.........S.....
This seems to work for me:
LANG=C grep --only-matching --byte-offset --binary --text --perl-regexp "<\x-hex pattern>" <file>
short form:
LANG=C grep -obUaP "<\x-hex pattern>" <file>
Example:
LANG=C grep -obUaP "\x01\x02" /bin/grep
Output (cygwin binary):
153: <\x01\x02>
33210: <\x01\x02>
53453: <\x01\x02>
So you can grep this again to extract offsets. But don't forget to use binary mode again.
Note: LANG=C
is needed to avoid utf8 encoding issues.
There's also a pretty handy tool called binwalk, written in python, which provides for binary pattern matching (and quite a lot more besides). Here's how you would search for a binary string, which outputs the offset in decimal and hex (from the docs):
$ binwalk -R "\x00\x01\x02\x03\x04" firmware.bin
DECIMAL HEX DESCRIPTION
--------------------------------------------------------------------------
377654 0x5C336 Raw string signature
We tried several things before arriving at an acceptable solution:
xxd -u /usr/bin/xxd | grep 'DF'
00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003 @.........S.....
root# grep -ibH "df" /usr/bin/xxd
Binary file /usr/bin/xxd matches
xxd -u /usr/bin/xxd | grep -H 'DF'
(standard input):00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003 @.........S.....
Then found we could get usable results with
xxd -u /usr/bin/xxd > /tmp/xxd.hex ; grep -H 'DF' /tmp/xxd
Note that using a simple search target like 'DF' will incorrectly match characters that span across byte boundaries, i.e.
xxd -u /usr/bin/xxd | grep 'DF'
00017b0: 4010 8D05 0DFF FF0A 0300 53E3 0610 A003 @.........S.....
--------------------^^
So we use an ORed regexp to search for ' DF' OR 'DF ' (the searchTarget preceded or followed by a space char).
The final result seems to be
xxd -u -ps -c 10000000000 DumpFile > DumpFile.hex
egrep ' DF|DF ' Dumpfile.hex
0001020: 0089 0424 8D95 D8F5 FFFF 89F0 E8DF F6FF ...$............
-----------------------------------------^^
0001220: 0C24 E871 0B00 0083 F8FF 89C3 0F84 DF03 .$.q............
--------------------------------------------^^
grep has a -P switch allowing to use perl regexp syntax the perl regex allows to look at bytes, using \x.. syntax.
so you can look for a given hex string in a file with: grep -aP "\xdf"
but the outpt won't be very useful; indeed better do a regexp on the hexdump output;
The grep -P can be useful however to just find files matrching a given binary pattern. Or to do a binary query of a pattern that actually happens in text (see for example How to regexp CJK ideographs (in utf-8) )
I just used this:
grep -c $'\x0c' filename
To search for and count a page control character in the file..
So to include an offset in the output:
grep -b -o $'\x0c' filename | less
I am just piping the result to less because the character I am greping for does not print well and the less displays the results cleanly. Output example:
21:^L
23:^L
2005:^L
If you want search for printable strings, you can use:
strings -ao filename | grep string
strings will output all printable strings from a binary with offsets, and grep will search within.
If you want search for any binary string, here is your friend:
- https://github.com/tmbinc/bgrep
精彩评论