Default int type: Signed or Unsigned?_问答_开发者

When programming in a开发者_开发知识库 C-like language should one's "default" integer type be int or uint/unsigned int? By default, I mean when you don't need negative numbers but either one should be easily big enough for the data you're holding. I can think of good arguments for both:

signed: Better-behaved mathematically, less possibility of weird behavior if you try to go below zero in some boundary case you didn't think of, generally avoids odd corner cases better.

unsigned: Provides a little extra assurance against overflow, just in case your assumptions about the values are wrong. Serves as documentation that the value represented by the variable should never be negative.

The Google C++ Style Guide has an interesting opinion on unsigned integers:

(quote follows:)

On Unsigned Integers

Some people, including some textbook authors, recommend using unsigned types to represent numbers that are never negative. This is intended as a form of self-documentation. However, in C, the advantages of such documentation are outweighed by the real bugs it can introduce. Consider:

for (unsigned int i = foo.Length()-1; i >= 0; --i) ...

This code will never terminate! Sometimes gcc will notice this bug and warn you, but often it will not. Equally bad bugs can occur when comparing signed and unsigned variables. Basically, C's type-promotion scheme causes unsigned types to behave differently than one might expect.

So, document that a variable is non-negative using assertions. Don't use an unsigned type.

(end quote)

Certainly signed. If overflow worries you, underflow should worry you more, because going "below zero" by accident is easier than over int-max.

"unsigned" should be a conscious choice that makes the developer think about potential risks, used only there where you are absolutely sure that you can never go negative (not even accidentally), and that you need the additional value space.

As a rough rule of thumb, I used unsigned ints for counting things, and signed ints for measuring things.

If you find yourself decrementing or subtracting from an unsigned int, then you should be in a context where you already expect to be taking great care not to underflow (for instance, because you're in some low-level code stepping back from the end of a string, so of course you have first ensured that the string is long enough to support this). If you aren't in a context like that, where it's absolutely critical that you don't go below zero, then you should have used a signed value.

In my usage, unsigned ints are for values which absolutely cannot go negative (or for that one in a million situation where you actually want modulo 2^N arithmetic), not for values which just so happen not to be negative, in the current implementation, probably.

I tend to go with signed, unless I know I need unsigned, as int is typically signed, and it takes more effort to type unsigned int, and uint may cause another programmer a slight pause to think about what the values can be.

So, I don't see any benefit to just defaulting to an unsigned, since the normal int is signed.

You don't get much 'assurance against overflow' with unsigned. You're as likely to get different but stranger behaviour than with signed, but slightly later... Better to get those assumptions right before hand maybe?

Giving a more specific type assignment (like unsigned int) conveys more information about the usage of the variable, and can help the compiler to keep track of any times when you're assigning an "incorrect" value. For instance, if you're using a variable to track the database ID of an object/element, there (likely) should never be a time when the ID is less than zero (or one); in this sort of case, rather than asserting that state, using an unsigned integer value conveys that statement to other developers as well as the compiler.

I doubt there is a really good language-agnostic answer to this. There are enough differences between languages and how they handle mixed types that no one answer is going to make sense for all (or even most).

In the languages I use most often, I use signed unless I have a specific reason to do otherwise. That's mostly C and C++ though. In another language, I might well give a different answer.