开发者

Using awk printf to urldecode text

开发者 https://www.devze.com 2023-01-16 05:19 出处:网络
I\'m using awk to urldecode some text. If I code the string into the printf statement like printf \"%s\", \"\\x3D\" it correctly outputs =. The same if I have the whole escaped string as a variable.

I'm using awk to urldecode some text.

If I code the string into the printf statement like printf "%s", "\x3D" it correctly outputs =. The same if I have the whole escaped string as a variable.

However, if I only have 开发者_JAVA百科the 3D, how can I append the \x so printf will print the = and not \x3D?

I'm using busybox awk 1.4.2 and the ash shell.


I don't know how you do this in awk, but it's trivial in perl:

echo "http://example.com/?q=foo%3Dbar" | 
    perl -pe 's/\+/ /g; s/%([0-9a-f]{2})/chr(hex($1))/eig'


Since you're using ash and Perl isn't available, I'm assuming that you may not have gawk.

For me, using gawk or busybox awk, your second example works the same as the first (I get "=" from both) unless I use the --posix option (in which case I get "x3D" for both).

If I use --non-decimal-data or --traditional with gawk I get "=".

What version of AWK are you using (awk, nawk, gawk, busybox - and version number)?

Edit:

You can coerce the variable's string value into a numeric one by adding zero:

~/busybox/awk 'BEGIN { string="3D"; pre="0x"; hex=pre string; printf "%c", hex+0}'


GNU awk

#!/usr/bin/awk -fn
@include "ord"
BEGIN {
  RS = "%.."
}
{
  printf RT ? $0 chr("0x" substr(RT, 2)) : $0
}

Or

#!/bin/sh
awk -niord '{printf RT?$0chr("0x"substr(RT,2)):$0}' RS=%..

Decoding URL encoding (percent encoding)


This relies on gnu awk's extension of the split function, but this works:

gawk '{ numElems = split($0, arr, /%../, seps);
        outStr = ""
        for (i = 1; i <= numElems - 1; i++) {
            outStr = outStr arr[i]
            outStr = outStr sprintf("%c", strtonum("0x" substr(seps[i],2)))
        }
        outStr = outStr arr[i]
        print outStr
      }'


To start with, I'm aware this is an old question, but none of the answers worked for me (restricted to busybox awk)

Two options. To parse stdin:

awk '{for (y=0;y<127;y++) if (y!=37) gsub(sprintf("%%%02x|%%%02X",y,y), y==38 ? "\\&" : sprintf("%c", y));gsub(/%25/, "%");print}'

To take a command line parameter:

awk 'BEGIN {for (y=0;y<127;y++) if (y!=37) gsub(sprintf("%%%02x|%%%02X",y,y), y==38 ? "\\&" : sprintf("%c", y), ARGV[1]);gsub(/%25/, "%", ARGV[1]);print ARGV[1]}' parameter

Have to do %25 last because otherwise strings like %253D get double-parsed, which shouldn't happen.

The inline check for y==38 is because gsub treats & as a special character unless you backslash it.


This one is the fastest of them all by a large margin and it doesn't need gawk:

#!/usr/bin/mawk -f

function decode_url(url,            dec, tmp, pre, mid, rep) {
    tmp = url
    while (match(tmp, /\%[0-9a-zA-Z][0-9a-zA-Z]/)) {
        pre = substr(tmp, 1, RSTART - 1)
        mid = substr(tmp, RSTART + 1, RLENGTH - 1)
        rep = sprintf("%c", ("0x" mid) + 0)
        dec = dec pre rep
        tmp = substr(tmp, RSTART + RLENGTH)
    }
    return dec tmp
}

{
    print decode_url($0)
}

Save it as decode_url.awk and use it like you normally would. E.g:

$ ./decode_url.awk <<< 'Hello%2C%20world%20%21'
Hello, world !

But if you want an even faster version:

#!/usr/bin/mawk -f

function gen_url_decode_array(      i, n, c) {
    delete decodeArray
    for (i = 32; i < 64; ++i) {
        c = sprintf("%c", i)
        n = sprintf("%%%02X", i)
        decodeArray[n] = c
        decodeArray[tolower(n)] = c
    }
}

function decode_url(url,            dec, tmp, pre, mid, rep) {
    tmp = url
    while (match(tmp, /\%[0-9a-zA-Z][0-9a-zA-Z]/)) {
        pre = substr(tmp, 1, RSTART - 1)
        mid = substr(tmp, RSTART, RLENGTH)
        rep = decodeArray[mid]
        dec = dec pre rep
        tmp = substr(tmp, RSTART + RLENGTH)
    }
    return dec tmp
}

BEGIN {
    gen_url_decode_array()
}

{
    print decode_url($0)
}

Other interpreters than mawk should have no problem with them.

0

精彩评论

暂无评论...
验证码 换一张
取 消