开发者

Is it ok to use £ as delimiter in preg_replace?

开发者 https://www.devze.com 2023-02-15 05:30 出处:网络
I am converting an eregi_replace function I found to preg_replace, but th开发者_C百科e eregi string has about every character on the keyboard in it. So I tried to use £ as the delimiter.. and it is w

I am converting an eregi_replace function I found to preg_replace, but th开发者_C百科e eregi string has about every character on the keyboard in it. So I tried to use £ as the delimiter.. and it is working currently, but I wonder if it might potentially cause problems because it is a non-standard character?

Here is the eregi:

function makeLinks($text) {  
$text = eregi_replace('(((f|ht){1}tp://)[-a-zA-Z0-9@:%_\+.~#?&//=]+)',
'<a href="\\1">\\1</a>', $text);
$text = eregi_replace('([[:space:]()[{}])(www.[-a-zA-Z0-9@:%_\+.~#?&//=]+)',
'\\1<a href="http://\\2">\\2</a>', $text);

    return $text;}

and the preg:

function makeLinks($text) {
    $text = preg_replace('£(((f|ht){1}tp://)[-a-zA-^Z0-9@:%_\+.~#?&//=]+)£i',
    '<a href="\\1">\\1</a>', $text);
    $text = preg_replace('£([[:space:]()[{}])(www.[-a-zA-Z0-9@:%_\+.~#?&//=]+)£i',
    '\\1<a href="http://\\2">\\2</a>', $text);

        return $text;
}


£ is problematic because it isn't an ASCII character. It's from the Latin-1 charset and will only work if your PHP script also uses the 8bit representation. Should your file be encoded as UTF-8, then £ will be represented as two bytes. And PCRE in PHP will trip over that. (At least my version does.)


You can use parentheses to delimit a regex rather than a single character, for example:

preg_replace('(abc/def#ghi)i', ...);

That would probably be nicer than trying to find an obscure character that's not (yet) part of your expression.


You can use the unicode character, just to be sure.

\u00A3

Watch out for the ereg functions and unicode support.

http://www.regular-expressions.info/php.html
http://www.regular-expressions.info/characters.html

Long live the Queen.


As @Chris pointed out, you can use paired bracket characters as delimiters, but they have to properly balanced throughout the regex. For example, '<<>' won't work, but '<<>>' will. You can use any of (), [], {} or <>, but I recommend the braces or the square brackets; parentheses are too common in regexes, and angle brackets are used in escape sequences like (?>...) (atomic group) and (?<=...) (lookbehind).

But I'm with @Brad on this one: why not just escape the delimiter character with a backslash whenever it appears in the regex?


You would know the data being parsed better than we would. As far as regex is concerned, it's no different than any other ASCII value.

Though I have to ask: what's wrong with traditional then just escaping it? Or using a class with a character range?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号