开发者

how do I get rid of "^M" in data I have scraped from a website?

开发者 https://www.devze.com 2023-02-28 16:30 出处:网络
I have data that looks like this: \"1964iwanttoholdyourhand beatles ^M oh yeah, i\'ll tell you something

I have data that looks like this:

"1964   iwanttoholdyourhand beatles



^M

oh yeah, i'll tell you something
i think you'll understand
when i'll say that something
i wanna hold your hand
i wanna hold your hand
i wanna hold your hand

oh please, say to me
you'll let me be your man
and please, say to me
you'll let me hold your hand
i'll let me hold your hand
i wanna hold your hand"

and I'm trying to get rid of the ^M, so I tried using a re.sub but that doesn'开发者_JAVA百科t find it - I think is some special character and not actually a "^" and "M" next to each other. any ideas on how to remove it? Thanks!


^M is used to represent the carriage return character in many editors. You would typically type Ctrl + M to generate that character in those editors.

Python represents this as '\r', as do most programming languages.


It's probably a windows vs. unix line endings issue. Unix uses \n (newline), windows uses \r\n instead (carriage return + newline). You want to remove the \r (ASCII codepoint 13); you can do it in python (without even using regexes I think), or you can simply run the fromdos program on your file. Many unix tools represent \r as ^M (M being the 13th letter of the alphabet).

This wikipedia article is a nice starting point.

0

精彩评论

暂无评论...
验证码 换一张
取 消