This is a strange problem. I started working at it on this thread, and now it has morphed into something slightly different.
I am using Popen() to execute a Perl program 'anubadok'. The Perl program writes its output to a file. Here's the snippet of Perl code that does this. The 4th. print statement (after open OUTFILE ...) writes non-English Unicode characters (Bengali). I added the 3 print lines before it to test if at all Unicode characters are being written correctly.
...
my $infile = shift ;
my $input = "STDIN" ;
if ( !$infile )
{
if (!$silent)
{
print STDERR "Reading form STDIN; (try: anubadok --help for usage or\n" ;
print STDERR "see manpage for details.)\n" ;
}
}
elsif ( -e $infile )
{
open ( FILE, "<:utf8", $infile)
|| die "Error! Couldn't open \"$infile\"! Exiting." ;
$input = "FILE" ;
}
else {
print STDERR "Error! Couldn't find \"$infile\"! Exiting.\n" ;
exit (1);
}
Initialize::check_user_anubadok_dir();
open ( OUTFILE, ">:utf8", "anubadok_outfile" );
print OUTFILE "hello";
print OUTFILE "হেলেছি";
print OUTFILE "world";
print OUTFILE
XMLPP::xml_post_processor(
Translator::translate_in_bengali(
PoSTagger::penn_treebank_tagger(
XMLPP::xml_pre_processor(<$input>))));
close OUTFILE;
# print STDOUT
# XMLPP::xml_post_processor(
# Translator::translate_in_bengali(
# PoSTagger::penn_treebank_tagger(
# XMLPP::xml_pre_processor(<$input>))));
...
Below is stripped down PyGTK code, which works correctly, to shows how I use Popen() to execute the subprocess. It executes the Perl program and the proper output gets written to a file. The actual program is longer in that it has more widgets to display and deals with larger files to populate the views and so on. There is no other logical difference that I can think of. I use Popen() identically, in the actual program. But strangely the output file that gets written by the Perl program, contains only the string "helloহেলেছিworld", i.e. the output of the 3 print statements. The output from the 4th print 开发者_如何学Cis lost. If I use STDOUT in stead of OUTFILE in the Perl program, and use communicate() to read the stdout object, I find it is empty. In the code below, again, this works correctly.
What can be causing this, and what am I missing ?
If any one wants to actually run this program, please get a copy of Anubadok, and include those extra 'print OUTFILE' block in the anubadok-0.2.1/bin/anubadok Perl script.
#!/usr/bin/env python
import pygtk,sys,gtk,os,subprocess,pdb
class C:
def main(self, argv=None):
gtk.main()
def __init__(self):
# Main window
self.window = gtk.Window(gtk.WINDOW_TOPLEVEL)
self.window.set_border_width(2)
self.window.set_position(gtk.WIN_POS_CENTER)
self.window.connect("destroy", self._destroy_window)
# TextView
self.v = gtk.TextView()
self.v.set_name("v")
self.vsw = gtk.ScrolledWindow()
self.vsw.set_policy(gtk.POLICY_AUTOMATIC, gtk.POLICY_AUTOMATIC)
self.vsw.add(self.v)
# TextView
self.v1 = gtk.TextView()
self.v1.set_name("v1")
self.v1sw = gtk.ScrolledWindow()
self.v1sw.set_policy(gtk.POLICY_AUTOMATIC, gtk.POLICY_AUTOMATIC)
self.v1sw.add(self.v1)
# TreeView
self.model = gtk.ListStore(str, str)
self.tv = gtk.TreeView(self.model)
self.tv.connect("row-activated", self._f, self.v)
self.tv.connect("row-activated", self._f, self.v1)
self.c = gtk.CellRendererText()
self.c1 = gtk.CellRendererText()
self.col = gtk.TreeViewColumn("C", self.c, text=0)
self.col1 = gtk.TreeViewColumn("C1", self.c1, text=1)
self.tv.append_column(self.col)
self.tv.append_column(self.col1)
self.tvsw = gtk.ScrolledWindow()
self.tvsw.set_policy(gtk.POLICY_AUTOMATIC, gtk.POLICY_AUTOMATIC)
self.tvsw.add(self.tv)
self.fill_model(self.model)
# Layout
self.rbox = gtk.VBox(False, 0)
self.rbox.pack_start(self.vsw, False, False, 0)
self.rbox.pack_start(self.v1sw, False, False, 0)
self.box = gtk.HBox(False, 0)
self.box.pack_start(self.tvsw, False, False, 0)
self.box.pack_start(self.rbox, False, False, 0)
self.window.add(self.box)
self.window.show_all()
def fill_model(self, model):
self.dbg("fill_model()")
model.clear()
fd = open("file", "r"); rows = fd.readlines(); fd.close()
for l in rows:
a = l.split()
model.append([l[0], l[1]])
return
def _f(self, tview, path, column, textview):
self.dbg("_f()")
tsel = tview.get_selection()
model, iter = tsel.get_selected()
buf = textview.get_buffer()
buf.set_text("")
if(textview.get_name() == "v"):
self.dbg("_f():v")
buf.set_text("hello")
elif(textview.get_name() == "v1"):
self.dbg("_f():v1")
x = "hello"
t = self.g(x)
buf.set_text(t)
return
def run(self, cmd):
self.dbg("run()")
"""
- Run command and return stdout as first argument of a
tuple and stderr as the second argument of the tuple.
- Returns None on error.
"""
try:
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p.wait()
if p.returncode:
print "failed with code: %s" % str(p.returncode)
return p.communicate()
except OSError:
print "OSError"
def g(self):
# pdb.set_trace()
self.dbg("g()")
p = self.run(["/home/rup/ir/utils/anubadok-0.2.1/bin/anubadok", "file1"])
return p[0]
def _destroy_window(self, widget, data = None):
self.dbg("_destroy_window()")
gtk.main_quit()
return
def dbg(self, msg):
sys.stderr.write("dbg: %s\n" % msg)
if __name__ == "__main__":
ui = C()
ui.main()
Both Perl and Python are sensitive to environment variables like LANG. Graphical launchers and terminal windows often hand different values to their children.
I'd try testing with ASCII input and output, printing STDERR to see if there are warnings, logging the environment of each process.
精彩评论