开发者

How to remove this special character?

开发者 https://www.devze.com 2023-03-21 23:51 出处:网络
I was trying to unify the lines in my file when I observed the following: word1 word2 word1 word2 I did not understand why these line开发者_运维知识库s were not combined so I opened the file in vim

I was trying to unify the lines in my file when I observed the following:

word1 word2

word1 word2

I did not understand why these line开发者_运维知识库s were not combined so I opened the file in vim and used :set list to see if there are any special characters and I found this:

 word1 <feff>word2
 word1 word2

I am not sure how to clean this word in Python. Any suggestions on what character might be and how this can be cleaned?


U+FEFF is the Byte Order Mark character, which should only occur at the start of a document. In documents, it should be treated as a ZERO WIDTH NON-BREAKING SPACE. If this causes issues, you can remove it like any other character:

>>> s = u'word1 \ufeffword2'
>>> s = s.replace(u'\ufeff', '')
>>> s
u'word1 word2'

(In Python 3.1 or 3.2, drop the u in front of strings)


Have you tried mytext.split(string.whitespace) ?

0

精彩评论

暂无评论...
验证码 换一张
取 消