开发者

UnicodeWarning when comparing unicode strings to unicode results from os.walk command

开发者 https://www.devze.com 2023-03-11 07:52 出处:网络
Using python 2.7 I\'m doing an os.walk with these files http://www.2shared.com/file/biSx7NI-/comer.html and then comparing the result against an array. In the actual program this array won\'t be prede

Using python 2.7 I'm doing an os.walk with these files http://www.2shared.com/file/biSx7NI-/comer.html and then comparing the result against an array. In the actual program this array won't be predefined. The code that I am trying to use is as follows

# -*- coding: utf-8 -*-
import os.path
group = ['comer.txt', 'coma.txt', 'comamos.txt', 'coman.txt', 'comas.txt', 'come.txt', 'comed.txt', 'comemos.txt', 'comen.txt', 'comeremos.txt', 'comer\xc3\xa1.txt', 'comer\xc3\xa1n.txt', 'comer\xc3\xa1s.txt', 'comer\xc3\xa9.txt', 'comer\xc3\xa9is.txt', 'comer\xc3\xada.txt', 'comer\xc3\xadais.txt', 'comer\xc3\xadamos.txt', 'comer\xc3\xadan.txt', 'comer\xc3\xadas.txt', 'comes.txt', 'comido.txt', 'comiendo.txt', 'comiera.txt', 'comierais.txt', 'comieran.txt', 'comieras.txt', 'comiere.txt', 'comiereis.txt', 'comieren.txt', 'comieres.txt', 'comieron.txt', 'comimos.txt', 'comiste.txt', 'comisteis.txt', 'comi\xc3\xa9ramos.txt', 'comi\xc3\xa9remos.txt', 'comi\xc3\xb3.txt', 'como.txt', 'com\xc3\xa1is.txt', 'com\xc3\xa9is.txt', 'com\xc3\xad.txt', 'com\xc3\xada.txt', 'com\xc3\xadais.txt', 'com\xc3\xadamos.txt', 'com\xc3\xadan.txt', 'com\xc3\xadas.txt', 'comer\xc3\xa1.txt', 'comer\xc3\xa9.txt', 'comer\xc3\xada.txt', 'comer\xc3\xadais.txt']

print "********what we have*********"
i=0
for f in group:
    group[i] = os.path.basename(f)
    group[i] = unicode(group[i], "utf-8")        
    print group[i]
    i += 1

wantedResults = []
print "********what we want*********"
for(path, dirs, files) in os.walk("C:\corpus\zz-auto generated\spanish\comer"):
    wantedResults.append(files)
for f in wantedResults[0]:
    print f

print "********problems*********"
for resultWanted in wantedResults[0]:
    if resultWanted not in group:
        print "did not match our wanted results: " + resultWanted
for result in group:
    if result not in wantedResults[0]:
        print "extra file: " + result

I'm getting this error:

Warning (from warnings module): File "C:\Users***\Desktop\osWalkTest.py", line 26 if result not in wantedResults[0]: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - >interpreting them as being unequal

I could really use some help in getting the predefined array and the array from the os.walk to properly compare. I've looked t开发者_开发知识库his up on Google, and have tried many combinations of encoding and decoding the two arrays, but nothing seems to work. Thanks.


Have you tried (note the 'u' before the string, which turns it to Unicode):

for(path, dirs, files) in os.walk(u"C:/corpus/zz-auto generated/spanish/comer"):

(note that having back-slashes in a string is not a good idea, Unicode or not).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号