开发者

Compile Syntax Error: non ASCII letters in a string

开发者 https://www.devze.com 2023-03-09 21:23 出处:网络
I have a python file that contains a long string of HTML. When I compile & run this file/script I get this e开发者_开发百科rror:

I have a python file that contains a long string of HTML. When I compile & run this file/script I get this e开发者_开发百科rror:

_SyntaxError: Non-ASCII character '\x92' in file   C:\Users...\GlobalVars.py on line 2509,   but no encoding declared; see http://www.python.org/peps/pep-0263.html for details_

I have followed the instructions and gone to the url suggested. But putting something like this at the top of my script still doesn't work:

#!/usr/bin/python
# -*- coding: latin-1 -*-

What do you think I can do to stop this compiler error from occuring?


First, in order to prevent problems like the one specified in the question you should not ever use other encoding than utf-8 for python source code.

This is the correct header to use

#! /usr/bin/env python
# -*- coding: utf-8 -*-

Now you have to convert the file from whatever encoding you may have to utf-8, probably your current text editor is able to do that.

If you wonder why I say this remember that it is impossible for a text editor to safely guess your non-unicode encoding because there is no BOM for non-unicode. For this reason most decent editors are using UTF-8 as default even when encoding is not specified. And BTW, the encoding specified in the python file header is for Python only, most editors ignore what you wrote there.

Also, as you can see Python is trying to decode a character above 128 using ASCII (not latin-1), this is supposed to fail. I am not sure why this happens but I don't even care too much because there is a much better way to solve the problem.


It must be at the top of the script that has the non-ASCII text, and it must match the actual encoding of the file. \x92 is CP1252, not Latin-1.


If you are just concerned about getting rid of this error without getting into the details of it(which you can get from the other answers on this page), you can do the following -

1) Copy your code and paste it in Notepad++

2) Select Encoding -> Encode in UTF-8

3) Select View -> Show Symbol -> Show All Characters

Now it would be visible to you that which symbol is causing the issue(x92 would be visible). Replace/Remove it to solve the problem.


Found this and hope it's helpful to the next person: http://www.sitepoint.com/forums/showthread.php?567734-Anyone-know-what-this-error-means

Code point 0x92 (146 decimal) is the right single quotation mark, or apostrophe (’) in Windows-1252. It's an invalid character in ISO 8859 and in UTF-8, since the 0x80-0x9F range is reserved for C1 control characters.

Not sure if I'm busting copyright. If so please remove the blockquote.


The encoding declaration indicates that you think the file is in latin-1 encoding, but the python interpreter is finding that a char at or very near line 2509 in GlobalVars.py that is not what you think it is.

You should first confirm the encoding of GlobalVars.py. Is it really latin-1?

Next, you should check the characters near line 2509. Are they also latin-1, or were they cut and pasted from a web page or somewhere else (maybe there are UTF-8 chars mixed up in there)?

If you have chars in your source file that aren't what you think they are, then you may need to clean up the file before going any further.


add these lines on top of your code

#! /usr/bin/env python
# -*- coding: utf-8 -*-


An easy workaround solution if your file is really in latin-1 is to change the html string with its representation.

Afaik:

\x92 => 146 in decimal => Æ => Æ

If your character is not Æ, then your file is not encoded into latin-1 ;-) (and you might wanna check if utf-8/cp1292 works better as a quick win)

EDIT: Of course, you want to check your ACTUAL file encoding before trying. I might be wrong, not 100% sure \x92 is Æ in Iso8859-1 : according to this page, it doesn't seem defined.

0

精彩评论

暂无评论...
验证码 换一张
取 消