开发者

Regular expression for validating cyrillic

开发者 https://www.devze.com 2023-02-21 21:53 出处:网络
I have a php function to valida开发者_开发百科te \"City\": function validate_city($field) { if ($field == \"\") return \"Enter city.<br />\";

I have a php function to valida开发者_开发百科te "City":

function validate_city($field) {
    if ($field == "") return "Enter city.<br />";
    else if (preg_match("/[^а-Яa-zA-z-]/", $field))
        return "City hame only from letters and -.<br />";
    return "";
}

Every time, when I enter a cyrillic City name (for ex: "Минск") it returns: City hame only from letters and -. Variable $_POST['city'] looks like: Ð�инÑ�к

In JS this code works correct, I think something is in encoding.....


You can use the following pattern to validate non latin characters:

preg_match ('/^[a-zA-Z\p{Cyrillic}\d\s\-]+$/u', $str);

See this post for the full explanation


A better solution to match Cyrillic and Common characters would be:

preg_match ('/^[\p{Cyrillic}\p{Common}]+$/u', $str);


This looks like utf-8, if it is, this tip from cebelab on php.net might be helpful:

I noticed that in order to deal with UTF-8 texts, without having to recompile php with the PCRE UTF-8 flag enabled, you can just add the following sequence at the start of your pattern: (*UTF8)

for instance : '#(*UTF8)[[:alnum:]]#' will return TRUE for 'é' where '#[[:alnum:]]#' will return FALSE

Use the builtin special character group :alnum: for this, you will need to reverse your match:

function validate_city($field) {
    if ($field == "") return "Enter city.<br />";
    else if (preg_match("/(*UTF8)^[[:alnum:]]+$/", $field))
    return "";
    return "City hame only from letters and -.<br />";
} 

edit, ah, forgot utf-8 in regex ; )


You have to make sure all your files have the same encoding or encode/decode the data in the appropriate places. If you're working with utf-8, check: - that your page is being displayed in the right encoding (Browser -> view -> encoding) - that your files have the right encoding

Your database, if you have one, should also be in the same encoding you choose everywhere else.


Yes, it's an encoding problem.
Put this in your page:

<META http-equiv="Content-Type" content='text/html;charset="UTF-8"'>

Or this:

<META http-equiv="Content-Type" content='text/html;charset="windows-1251"'>


Check encoding in response headers (FireBug is a great tool). Possibly you have incorrect value in Webserver configuration (for example, AddDefaultCharset in .htaccess file).

PS. Use UTF regexps instead of character ranges (preg_match("/[^\pL-]/u", $field))


Variable $_POST['city'] looks like: �ин�к

It's not an UTF-8.... Maybe problemis in $_POST?


I try to use ^[a-zA-Z\p{Cyrillic}\d\s\-]+$ in https://regex101.com/ and everything is ok.

0

精彩评论

暂无评论...
验证码 换一张
取 消