I'm trying to use the script command to record an interactive shell session so that I can use it to prepare documentation.
according to the man page:
Script places everything in the log file, including linefeeds and
backspaces. This is not what the naive user expects.
I am the naive user (don't usually get a shout out in man pages, this is rather exciting!), and I'd like to process the output so that backspaces, linefeeds and deleted characters and so on are removed.
example, I run a script session:
stew:~> script -f scriptsession.log
Script started, file is scriptsession.log
stew:~> date
Mon Aug 22 15:00:37 EDT 2011
stew:~> #extra chars: that
stew:~> exit
exit
Script done, file is scriptsession.log
then I use cat to read the session log:
stew:~> cat scriptsession.log
Script started on Mon 22 Aug 2011 03:00:35 PM EDT
stew:~> date
Mon Aug 22 15:00:37 EDT 2011
stew:~> #extra chars: that
stew:~> exit
exit
Script done on Mon 22 Aug 2011 03:01:01 PM EDT
but when I use less, I see evidence of the unwanted characters that are invisible using cat:
stew:~> less scriptsession.log
Script started on Mon 22 Aug 2011 03:00:35 PM EDT
stew:~> date
Mon Aug 22 15:00:37 EDT 2011
stew:~> #extra chars: thiESC[ESC[ESC[ESC[Kthat
stew:~> exit
exit
Script done on Mon 22 Aug 2011 03:01:01 PM EDT
scriptsession.log lines 1-8/8 (END)
when I use cat, I understand that it doesn't remove the invisible chars, it just doesn't represent开发者_开发问答 them visibly, like less does--so if I pipe the cat output to a file, it still has the unwanted characters.
the output format I'd like is a copy of what cat displays. thanks!
(apologies if this is a duplicate, searching "unix script output format" returns lots of noise results with respect to the question at hand!)
The col
command will do some, but not all, of the filtering you're looking for. (It doesn't seem to recognize the control sequences for bold and underlining, for example.)
An approach I've used in the past is to (a) change my shell prompt so it doesn't do any highlighting (it normally does), and/or (b) set $TERM
to "dumb"
so various commands won't try to use certain control sequences.
I solved the problem by running scriptreplay
in a screen and the dumping the scrollback buffer to a file.
The following expect script does this for you.
It has been tested for logfiles with up to 250.000 lines. In the working directory you need your scriptlog, a file called "time" with 10.000.000 times the line "1 10" in it, and the script. I needs the name of your scriptfile as command line argument, like ./name_of_script name_of_scriptlog
.
#!/usr/bin/expect -f
set logfile [lindex $argv 0]
if {$logfile == ""} {puts "Usage: ./script_to_readable.exp \$logfile."; exit}
set timestamp [clock format [clock sec] -format %Y-%m-%d,%H:%M:%S]
set pwd [exec pwd]
if {! [file exists ${pwd}/time]} {puts "ERROR: time file not found.\nYou need a file named time with 10.000.000 times the line \"1 10\" in the working directory for this script to work. Please provide it."; exit}
set wc [exec cat ${pwd}/$logfile | wc -l]
set height [ expr "$wc" + "100" ]
system cp $logfile ${logfile}.tmp
system echo $timestamp >> ${logfile}.tmp
set timeout -1
spawn screen -h $height -S $timestamp
send "scriptreplay -t time -s ${logfile}.tmp 100000 2>/dev/null\r"
expect ${timestamp}
send "\x01:hardcopy -h readablelog.${timestamp}\r"
send "exit\r"
system sed '/^$/d' readablelog.$timestamp >> readablelog2.$timestamp
system head -n-2 readablelog2.$timestamp >> ${logfile}.readable.$timestamp
system rm -f readablelog.$timestamp readablelog2.$timestamp ${logfile}.tmp
The time file can be generated by
for i in $(seq 1 10000000); do echo "1 10" >> time; done
As mentioned by Keith, col
does part of the job (the control characters).
You can further use ansifilter
to remove any ANSI escape sequences that you don't want: http://www.andre-simon.de/zip/download.html#ansifilter
Or you can use the "more" command, which will interpret those characters and display exactly what you typed, received as output, etc, as if you scrolled back in your buffer.
# awk script
{
gsub(/\033\[[CK]/, "")
while (sub(/.\b/, "")) ;
print
}
The script removes interleaving 'ESC [ C' and 'ESC [ K' substrings. Then replaces 'c BS' substrings to nothig, where c stands for any character.
精彩评论