ISO/IEC 2022 defines the C0 and C1 control codes. The C0 set are the familiar codes between 0x00
and 0x1f
in ASCII, ISO-8859-1 and UTF-8 (eg. ESC, CR, LF).
Some VT100 terminal emulat开发者_运维技巧ors (eg. screen(1)
, PuTTY) support the C1 set, too. These are the values between 0x80
and 0x9f
(so, for example, 0x84
moves the cursor down a line).
I am displaying user-supplied input. I do not wish the user input to be able to alter the terminal state (eg. move the cursor). I am currently filtering out the character codes in the C0 set; however I would like to conditionally filter out the C1 set too, if terminal will interpret them as control codes.
Is there a way of getting this information from a database like termcap
?
The only way to do it that I can think of is using C1 requests and testing the return value:
$ echo `echo -en "\x9bc"`
^[[?1;2c
$ echo `echo -e "\x9b5n"`
^[[0n
$ echo `echo -e "\x9b6n"`
^[[39;1R
$ echo `echo -e "\x9b0x" `
^[[2;1;1;112;112;1;0x
The above ones are:
CSI c Primary DA; request Device Attributes
CSI 5 n DSR; Device Status Report
CSI 6 n CPR; Cursor Position Report
CSI 0 x DECREQTPARM; Request Terminal Parameters
The terminfo/termcap that ESR maintains (link) has a couple of these requests in user strings 7 and 9 (user7/u7, user9/u9):
# INTERPRETATION OF USER CAPABILITIES # # The System V Release 4 and XPG4 terminfo format defines ten string # capabilities for use by applications, .... In this file, we use # certain of these capabilities to describe functions which are not covered # by terminfo. The mapping is as follows: # # u9 terminal enquire string (equiv. to ANSI/ECMA-48 DA) # u8 terminal answerback description # u7 cursor position request (equiv. to VT100/ANSI/ECMA-48 DSR 6) # u6 cursor position report (equiv. to ANSI/ECMA-48 CPR) # # The terminal enquire string should elicit an answerback response # from the terminal. Common values for will be ^E (on older ASCII # terminals) or \E[c (on newer VT100/ANSI/ECMA-48-compatible terminals). # # The cursor position request () string should elicit a cursor position # report. A typical value (for VT100 terminals) is \E[6n. # # The terminal answerback description (u8) must consist of an expected # answerback string. The string may contain the following scanf(3)-like # escapes: # # %c Accept any character # %[...] Accept any number of characters in the given set # # The cursor position report () string must contain two scanf(3)-style # %d format elements. The first of these must correspond to the Y coordinate # and the second to the %d. If the string contains the sequence %i, it is # taken as an instruction to decrement each value after reading it (this is # the inverse sense from the cup string). The typical CPR value is # \E[%i%d;%dR (on VT100/ANSI/ECMA-48-compatible terminals). # # These capabilities are used by tac(1m), the terminfo action checker # (distributed with ncurses 5.0).
Example:
$ echo `tput u7`
^[[39;1R
$ echo `tput u9`
^[[?1;2c
Of course, if you only want to prevent display corruption, you can use less
approach, and let the user switch between displaying/not displaying control characters (-r and -R options in less
). Also, if you know your output charset, ISO-8859 charsets have the C1 range reserved for control codes (so they have no printable chars in that range).
Actually, PuTTY does not appear to support C1 controls.
The usual way of testing this feature is with vttest, which provides menu entries for changing the input- and output- separately to use 8-bit controls. PuTTY fails the sanity-check for each of those menu entries, and if the check is disabled, the result confirms that PuTTY does not honor those controls.
I don't think there's a straightforward way to query whether the terminal supports them. You can try nasty hacky workarounds (like print them and then query the cursor position) but I really don't recommend anything along these lines.
I think you could just filter out these C1 codes unconditionally. Unicode declares the U+0080.. U+009F range as control characters anyway, I don't think you should ever use them for anything different.
(Note: you used the example 0x84
for cursor down. It's in fact U+0084
encoded in whichever encoding the terminal uses, e.g. 0xC2 0x84
for UTF-8.)
Doing it 100% automatically is challenging at best. Many, if not most, Unix interfaces are smart (xterms and whatnot), but you don't actually know if connected to an ASR33 or a PC running MSDOS.
You could try some of the terminal interrogation escape sequences and timeout if there is no reply. But then you might have to fall back and maybe ask the user what kind of terminal they are using.
精彩评论