开发者

How do I unescape c-style escape sequences from ruby?

开发者 https://www.devze.com 2023-01-26 22:16 出处:网络
In ruby, how do I decode c-style escape sequences?e.g. \'\\n\' to a newline,开发者_如何学编程 \'\\t\' to a tab?Okay, if you don\'t like eval solution, I\'ve hacked a simple state machine in Ruby to pa

In ruby, how do I decode c-style escape sequences? e.g. '\n' to a newline,开发者_如何学编程 '\t' to a tab?


Okay, if you don't like eval solution, I've hacked a simple state machine in Ruby to parse simple "\n" and "\t" in strings correctly, including pre-escaping of backslash itself. Here it is:

BACKSLASH = "\\"

def unescape_c_string(s)
    state = 0
    res = ''
    s.each_char { |c|
        case state
        when 0
            case c
            when BACKSLASH then state = 1
            else res << c
            end
        when 1
            case c
            when 'n' then res << "\n"; state = 0
            when 't' then res << "\t"; state = 0
            when BACKSLASH then res << BACKSLASH; state = 0
            else res << BACKSLASH; res << c; state = 0
            end
        end
    }
    return res
end

This one can be easily extended to support more characters, including multi-character entities, like \123. Test unit to prove that it works:

require 'test/unit'

class TestEscapeCString < Test::Unit::TestCase
    def test_1
        assert_equal("abc\nasd", unescape_c_string('abc\nasd'))
    end
    def test_2
        assert_equal("abc\tasd", unescape_c_string('abc\tasd'))
    end
    def test_3
        assert_equal("abc\\asd", unescape_c_string('abc' + BACKSLASH * 2 + 'asd'))
    end
    def test_4
        assert_equal("abc\\nasd", unescape_c_string('abc' + BACKSLASH * 2 + 'nasd'))
    end
    def test_5
        assert_equal("abc\\\nasd", unescape_c_string('abc' + BACKSLASH * 3 + 'nasd'))
    end
    def test_6
        assert_equal("abc\\\\nasd", unescape_c_string('abc' + BACKSLASH * 4 + 'nasd'))
    end
end


Shorter, even more hacky and fairly dangerous, due to eval:

eval "\"#{string}\""
A simple example:
> a = '1\t2\n3'
> puts a
1\t2\n3
> puts eval "\"#{a}\""
1       2
3


The following code will correctly decode all escape sequences defined by the ISO-C standard. It is save and reasonable performant:

ISO_C_ESCAPE_SEQUENCES = %r{
    # One letter escapes
    (?:\\[abfnrtv\\'"?])

    # Hex encoded character
    | (?:\\(x)([A-Fa-f0-9]{2,}))

    # Any Unicode code point (8 hex digits) or
    # Unicode code point below 1000 (4 hex digits)
    | (?:\\(u)((?:[A-Fa-f0-9]{8})|(?:[A-Fa-f0-9]{4})))

    # Octal encoded character
    | (?:\\([0-7]{1,3}))
}x


ISO_C_ONE_LETTER_ESCAPES = {
    "\\a"  => "\a",
    "\\b"  => "\b",
    "\\f"  => "\f",
    "\\n"  => "\n",
    "\\r"  => "\r",
    "\\t"  => "\t",
    "\\v"  => "\v",
    "\\\\" => "\\",
    "\\'"  => "'",
    "\\\"" => "\"",
    "\\?"  => "?"
}


def decodeCString( cString )
    return cString.gsub(ISO_C_ESCAPE_SEQUENCES) { |match|
        replacement = ISO_C_ONE_LETTER_ESCAPES[match]
        next replacement if replacement
        next $2.to_i(16).chr if $1 == "x"
        next $4.to_i(16).chr(Encoding::UTF_8) if $3 == "u"
        next $5.to_i(8).chr
    }
end

Here's a sample:

puts decodeCString("Line \\\\n Same Line!\\nNew line\\x0ANew line")
puts decodeCString("Smiley: \\u263A\tHorse head: \\u00010083")
puts decodeCString("Equal sign in quotes: \\\"\\75\\\"")

prints

Line \n Same Line!
New line
New line
Smiley: ☺   Horse head: 
0

精彩评论

暂无评论...
验证码 换一张
取 消