I am making a swedish website, and swedish letters are å, ä, and ö.
I need to make a string entered by a user to become url-safe with PHP.
Basically, need to convert all characters to underscore, all EXCEPT these:
A-Z, a-z, 1-9
and all swedish should be converted like this:
'å' to 'a' and 'ä' to 'a' and 'ö' to 'o' (just remove th开发者_高级运维e dots above).
The rest should become underscores as I said.
Im not good at regular expressions so I would appreciate the help guys!
Thanks
NOTE: NOT URLENCODE...I need to store it in a database... etc etc, urlencode wont work for me.
This should be useful which handles almost all the cases.
function Unaccent($string)
{
return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '$1', htmlentities($string, ENT_COMPAT, 'UTF-8'));
}
Use iconv to convert strings from a given encoding to ASCII, then replace non-alphanumeric characters using preg_replace:
$input = 'räksmörgås och köttbullar'; // UTF8 encoded
$input = iconv('UTF-8', 'ASCII//TRANSLIT', $input);
$input = preg_replace('/[^a-zA-Z0-9]/', '_', $input);
echo $input;
Result:
raksmorgas_och_kottbullar
// normalize data (remove accent marks) using PHP's *intl* extension
$data = normalizer_normalize($data);
// replace everything NOT in the sets you specified with an underscore
$data = preg_replace("#[^A-Za-z1-9]#","_", $data);
and all swedish should be converted like this:
'å' to 'a' and 'ä' to 'a' and 'ö' to 'o' (just remove the dots above).
Use normalizer_normalize()
to get rid of diacritical marks.
The rest should become underscores as I said.
Use preg_replace()
with a pattern of [\W]
(i.o.w: any character which doesn't match letters, digits or underscore) to replace them by underscores.
Final result should look like:
$data = preg_replace('[\W]', '_', normalizer_normalize($data));
If intl php extension is enabled, you can use Transliterator like this :
protected function removeDiacritics($string)
{
$transliterator = \Transliterator::create('NFD; [:Nonspacing Mark:] Remove; NFC;');
return $transliterator->transliterate($string);
}
To remove other special chars (not diacritics only like 'æ')
protected function removeDiacritics($string)
{
$transliterator = \Transliterator::createFromRules(
':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;',
\Transliterator::FORWARD
);
return $transliterator->transliterate($string);
}
If you're just interested in making things URL safe, then you want urlencode
.
Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 1738 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.
If you really want to strip all non A-Z, a-z, 1-9 (what's wrong with 0
, by the way?), then you want:
$mynewstring = preg_replace('/[^A-Za-z1-9]/', '', $str);
as simple as
$str = str_replace(array('å', 'ä', 'ö'), array('a', 'a', 'o'), $str);
$str = preg_replace('/[^a-z0-9]+/', '_', strtolower($str));
assuming you use the same encoding for your data and your code.
One simple solution is to use str_replace function with search and replace letter arrays.
You don't need fancy regexps to filter the swedish chars, just use the strtr function to "translate" them, like:
$your_URL = "www.mäåö.com";
$good_URL = strtr($your_URL, "äåöë etc...", "aaoe etc...");
echo $good_URL;
->output: www.maao.com :)
精彩评论