my code has a trouble with encoding error:
File "test.py", line 1
SyntaxError: encoding problem: with BOM
I attached the code below. is there any clear idea to fix it? the input and the output file contain Korean words, numbers, and english charaters.
I try to run this code in Mac and windows, it doesn't work at both of OS. please help me out!
# coding: uft-8
from __future__ import print_function
from __future__ import unicode_literals
import os, sys
import codecs
import re
import subprocess, shlex
REGEXP = re.compile(r'(\w+)/(\(.*?\))')
def main():
words = {}
with codecs.open('E:\\mach.txt', 'r', encoding='cp949') as fp:
for line in fp:
for item, category in REGEXP.findall(line):
words.setdefault(category, {}).setdefault(item, 0)
words[category][item] += 1
with codecs.open('result.txt', 'w', encoding='cp949') as fp:
for category, words in sorted(words.items()):
print(category, file=fp)
for word, count in words.items():
print(word, count, sep=' ', file=fp)
print(file=fp) 开发者_运维技巧
return 0
if __name__ == '__main__':
raise SystemExit(main())
You've misspelled UTF-8 on the first line. Since you've only used ASCII characters in you're code, it isn't even required to have a coding line.
The problem is not with the code, but with the encoding of the script itself. Try saving it with a different editor. Notepad on Windows works for me.
精彩评论