endcoding error_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-04-07 19:20 出处：网络

my code has a trouble with encoding error: File \"test.py\", line 1 SyntaxError: encoding problem: with BOM

相关专题：python

my code has a trouble with encoding error:

File "test.py", line 1

SyntaxError: encoding problem: with BOM

I attached the code below. is there any clear idea to fix it? the input and the output file contain Korean words, numbers, and english charaters.

I try to run this code in Mac and windows, it doesn't work at both of OS. please help me out!

# coding: uft-8

from __future__ import print_function
from __future__ import unicode_literals
import os, sys
import codecs
import re
import subprocess, shlex


REGEXP = re.compile(r'(\w+)/(\(.*?\))')                                                                                                                                                                                                                                 


def main():                                                                                                                                                                                                                                                             
    words = {}                                                                                                                                                                                                                                                          

    with codecs.open('E:\\mach.txt', 'r', encoding='cp949') as fp: 
        for line in fp:                                                                                                                                                                                                                                                     
            for item, category in REGEXP.findall(line):                                                                                                                                                                                                                     
                words.setdefault(category, {}).setdefault(item, 0)                                                                                                                                                                                                          
                words[category][item] += 1                                                                                                                                                                                                                                  

    with codecs.open('result.txt', 'w', encoding='cp949') as fp:                                                                                                                                                                                                                                        
        for category, words in sorted(words.items()):                                                                                                                                                                                                                       
            print(category, file=fp)                                                                                                                                                                                                                                        
            for word, count in words.items():                                                                                                                                                                                                                               
                print(word, count, sep=' ', file=fp)                                                                                                                                                                                                                        
            print(file=fp)                                                                                                                                                               开发者_运维技巧                                                                                   
    return 0                                                                                                                                                                                                                                                            

if __name__ == '__main__':                                                                                                                                                                                                                                              
    raise SystemExit(main())

You've misspelled UTF-8 on the first line. Since you've only used ASCII characters in you're code, it isn't even required to have a coding line.

The problem is not with the code, but with the encoding of the script itself. Try saving it with a different editor. Notepad on Windows works for me.