Any python/django function to check whether a string only contains characters included in my database collation?_问答_开发者

Any python/django function to check whether a string only contains characters included in my database collation?

开发者 https://www.devze.com 2022-12-21 04:38 出处：网络

As expected, I get an error when entering some characters not included in my database collation: (1267, \"Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COE开发者_运维百科

As expected, I get an error when entering some characters not included in my database collation:

(1267, "Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COE开发者_运维百科RCIBLE) for operation '='")

Is there any function I could use to make sure a string only contains characters existing in my database collation?

thanks

You can use a regular expression to only allow certain characters. The following allows only letters, numbers and _(underscore), but you can change to include whatever you want:

import re

exp = '^[A-Za-z0-9_]+$'
re.match(exp, my_string)

If an object is returned a match is found, if no return value, invalid string.

I'd look at Python's unicode.translate() and codec.encode() functions. Both of these would allow more elegant handling of non-legal input characters, and IIRC, translate() has been shown to be faster than a regexp for similar use-cases (should be easy to google the findings).

From Python's docs:

"For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted. Note, a more flexible approach is to create a custom character mapping codec using the codecs module (see encodings.cp1251 for an example)."

http://docs.python.org/library/stdtypes.html

http://docs.python.org/library/codecs.html