Python utf-8, howto align printout_问答_开发者

I have a array containing japanese caracters as well as "normal". How do I align the printout of these?

#!/usr/bin/python
# coding=utf-8

a1=['する', 'します', 'trazan', 'した', 'しました']
a2=['dipsy', 'laa-laa', 'banarne', 'po', 'tinky winky']

for i,j in zip(a1,a2):
  开发者_运维百科  print i.ljust(12),':',j

print '-'*8

for i,j in zip(a1,a2):
    print i,len(i)
    print j,len(j)

Output:

する       : dipsy
します    : laa-laa
trazan       : banarne
した       : po
しました : tinky winky
--------
する 6
dipsy 5
します 9
laa-laa 7
trazan 6
banarne 7
した 6
po 2
しました 12
tinky winky 11

thanks, //Fredrik

Using the unicodedata.east_asian_width function, keep track of which characters are narrow and wide when computing the length of the string.

#!/usr/bin/python
# coding=utf-8

import sys
import codecs
import unicodedata

out = codecs.getwriter('utf-8')(sys.stdout)

def width(string):
    return sum(1+(unicodedata.east_asian_width(c) in "WF")
        for c in string)

a1=[u'する', u'します', u'trazan', u'した', u'しました']
a2=[u'dipsy', u'laa-laa', u'banarne', u'po', u'tinky winky']

for i,j in zip(a1,a2):
    out.write('%s %s: %s\n' % (i, ' '*(12-width(i)), j))

Outputs:

する          : dipsy
します        : laa-laa
trazan        : banarne
した          : po
しました      : tinky winky

It doesn’t look right in some web browser fonts, but in a terminal window they line up properly.

Use unicode objects instead of byte strings:

#!/usr/bin/python
# coding=utf-8

a1=[u'する', u'します', u'trazan', u'した', u'しました']
a2=[u'dipsy', u'laa-laa', u'banarne', u'po', u'tinky winky']

for i,j in zip(a1,a2):
    print i.ljust(12),':',j

print '-'*8

for i,j in zip(a1,a2):
    print i,len(i)
    print j,len(j)

Unicode objects deal with characters directly.

You need to manually build the string and also manually build the format length. There is no easy way for this

The three functions below do this (needs unicodedata):

shortenStringCJK: correctly shorten to a length for fitting in some output (not length cut for getting X characters)

def shortenStringCJK(string, width, placeholder='..'):
# get the length with double byte charactes
string_len_cjk = stringLenCJK(str(string))
# if double byte width is too big
if string_len_cjk > width:
    # set current length and output string
    cur_len = 0
    out_string = ''
    # loop through each character
    for char in str(string):
        # set the current length if we add the character
        cur_len += 2 if unicodedata.east_asian_width(char) in "WF" else 1
        # if the new length is smaller than the output length to shorten too add the char
        if cur_len <= (width - len(placeholder)):
            out_string += char
    # return string with new width and placeholder
    return "{}{}".format(out_string, placeholder)
else:
    return str(string)

stringLenCJK: get correct length (as in space taken on a terminal)

def stringLenCJK(string):
    # return string len including double count for double width characters
    return sum(1 + (unicodedata.east_asian_width(c) in "WF") for c in string)

formatLen: format the length to adjust for width from double byte characters. without this one the length will be unbalanced.

def formatLen(string, length):
    # returns length udpated for string with double byte characters
    # get string length normal, get string length including double byte characters
    # then subtract that from the original length
    return length - (stringLenCJK(string) - len(string))

to then output some string: pre define the format string

format_str = "|{{:<{len}}}|"
format_len = 26
string_len = 26

and output as follows (where _string is the string to output)

print("Normal : {}".format(
    format_str.format(
        len=formatLen(shortenStringCJK(_string, width=string_len), format_len))
    ).format(
        shortenStringCJK(_string, width=string_len)
    )
)