In the section covering setlocale, the ANSI C standard states in a footnote that the only ctype.h functions whose behaviour is not affected by the current locale are isdigit and isxdigit.
The Microsoft implementation of isdigit is locale dependent because, for example, in locales using code page 1250 isdigit only returns non-zero for characters in the range 0x30 ('0') - 0x39 ('9'), whereas in locales using code page 1252 isdigit also returns non-zero for the superscript digits 0xB2 ('²'), 0xB3 ('³') and 0xB9 ('¹').
Is Microsoft in violation of the C standard by making isdigit locale dependent?
In this question I am primarily interested in C90, which Microsoft claims to conform to, rather than C99.
Additional background:
Microsoft's own documentation of setlocale incorrectly states that isdigit is unaffected by the LC_CTYPE part of the locale.
The section of the C standard that covers the ctype.h functions contains some wording that I consider ambiguous:
The behavior of these functions is affected by the current locale. Th开发者_JS百科ose functions that have locale-specific aspects only when not in the "C" locale are noted below.
I consider this ambiguous because it is unclear what it is trying to say about functions such as isdigit for which there are no notes about locale-specific aspects. It might be trying to say that such functions must be assumed to be locale dependent, in which case Microsoft's implementation of isdigit would be OK. (Except that the footnote I mentioned earlier seems to contradict this interpretation.)
- Microsoft is always right.
- If Microsoft is not right see Item 1
Microsoft always has its own interpretation of the spec. And usually the sentence “but Microsoft is wrong” does not carry any weight with your CEO, so you have to code around MS bugs/interpretations.
The amount of code to support incorrect behavior of IE and Outlook is staggering.
In many cases, the only solution is to roll your own version of the function that does the right thing and do something like this:
int my_isdigit( int c )
{
#ifdef WIN32
your implementation goes here
#else
return isdigit( c );
#endif
}
The required character set is defined in section 2.2.1. Section 2.2.1.2 then goes on to describe the behavior of extension characters:
- The single-byte characters defined in $2.2.1 shall be present.
- The presence, meaning, and representation of any additional members is locale-specific.
The answer is the same for all versions of the C standard, but here I will be using N3054, a draft for C23.
The description of isdigit
, in 7.4.1.5, is very simple:
The
isdigit
function tests for any decimal-digit character (as defined in 5.2.1).
So we need to look at 5.2.1 to see what a decimal-digit character is. The exact phrase "decimal-digit character" does not appear there, but we do get a description of characters required to be in the basic character sets, which includes "the 10 decimal digits" follows by an explicit listing of the digits from 0 to 9. This is surely the definition we seek, since there is no other candidate available.
This unambiguously indicates that the isdigit function tests for precisely those 10 characters, and none others. In particular, it cannot be locale-specific.
Incidentally, by similar reasoning, the isxdigit
function is also not locale-specific.
精彩评论