开发者

How to force PyYAML to load strings as unicode objects?

开发者 https://www.devze.com 2022-12-31 22:14 出处:网络
The PyYAML package loads unmarked strings as either unicode or str objects, depending on their content.

The PyYAML package loads unmarked strings as either unicode or str objects, depending on their content.

I would like to use unicode objects throughout my program (and, unfortunately, can't switch to Python 3 just yet).

Is there an easy way to force PyYAML to always strings load unicode objects? I do not want to clutter my YAML with !!python/unicode tags.

# Encoding: UTF-8

import yaml

menu= u""开发者_StackOverflow中文版"---
- spam
- eggs
- bacon
- crème brûlée
- spam
"""

print yaml.load(menu)

Output: ['spam', 'eggs', 'bacon', u'cr\xe8me br\xfbl\xe9e', 'spam']

I would like: [u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam']


Here's a version which overrides the PyYAML handling of strings by always outputting unicode. In reality, this is probably the identical result of the other response I posted except shorter (i.e. you still need to make sure that strings in custom classes are converted to unicode or passed unicode strings yourself if you use custom handlers):

# -*- coding: utf-8 -*-
import yaml
from yaml import Loader, SafeLoader

def construct_yaml_str(self, node):
    # Override the default string handling function 
    # to always return unicode objects
    return self.construct_scalar(node)
Loader.add_constructor(u'tag:yaml.org,2002:str', construct_yaml_str)
SafeLoader.add_constructor(u'tag:yaml.org,2002:str', construct_yaml_str)

print yaml.load(u"""---
- spam
- eggs
- bacon
- crème brûlée
- spam
""")

(The above gives [u'spam', u'eggs', u'bacon', u'cr\xe8me br\xfbl\xe9e', u'spam'])

I haven't tested it on LibYAML (the c-based parser) as I couldn't compile it though, so I'll leave the other answer as it was.


Here's a function you could use to use to replace str with unicode types from the decoded output of PyYAML:

def make_str_unicode(obj):
    t = type(obj)

    if t in (list, tuple):
        if t == tuple:
            # Convert to a list if a tuple to 
            # allow assigning to when copying
            is_tuple = True
            obj = list(obj)
        else: 
            # Otherwise just do a quick slice copy
            obj = obj[:]
            is_tuple = False

        # Copy each item recursively
        for x in xrange(len(obj)):
            obj[x] = make_str_unicode(obj[x])

        if is_tuple: 
            # Convert back into a tuple again
            obj = tuple(obj)

    elif t == dict: 
        for k in obj:
            if type(k) == str:
                # Make dict keys unicode
                k = unicode(k)
            obj[k] = make_str_unicode(obj[k])

    elif t == str:
        # Convert strings to unicode objects
        obj = unicode(obj)
    return obj

print make_str_unicode({'blah': ['the', 'quick', u'brown', 124]})
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号