开发者

SQL: Using NULL values vs. default values

开发者 https://www.devze.com 2022-12-17 14:38 出处:网络
What are the 开发者_JS百科pros and cons of using NULL values in SQL as opposed to default values?

What are the 开发者_JS百科pros and cons of using NULL values in SQL as opposed to default values?

PS. Many similar questions has been asked on here but none answer my question.


I don't know why you're even trying to compare these to cases. null means that some column is empty/has no value, while default value gives a column some value when we don't set it directly in query.

Maybe some example will be better explanation. Let's say we've member table. Each member has an ID and username. Optional he might has an e-mail address (but he doesn't have to). Also each member has a postCount column (which is increased every time user write a post). So e-mail column can have a null value (because e-mail is optional), while postCount column is NOT NULL but has default value 0 (because when we create a new member he doesn't have any posts).


Null values are not ... values!

Null means 'has no value' ... beside the database aspect, one important dimension of non valued variables or fields is that it is not possible to use '=' (or '>', '<'), when comparing variables.

Writting something like (VB):

if myFirstValue = mySecondValue

will not return either True or False if one or both of the variables are non-valued. You will have to use a 'turnaround' such as:

if (isnull(myFirstValue) and isNull(mySecondValue)) or myFirstValue = mySecondValue

The 'usual' code used in such circumstances is

if Nz(myFirstValue) = Nz(mySecondValue, defaultValue)

Is not strictly correct, as non-valued variables will be considered as 'equal' to the 'defaultValue' value (usually Zero-length string).

In spite of this unpleasant behaviour, never never never turn on your default values to zero-length string (or '0's) without a valuable reason, and easing value comparison in code is not a valuable reason.


NULL values are meant to indicate that the attribute is either not applicable or unknown. There are religious wars fought over whether they're a good thing or a bad thing but I fall in the "good thing" camp.

They are often necessary to distinguish known values from unknown values in many situations and they make a sentinel value unnecessary for those attributes that don't have a suitable default value.

For example, whilst the default value for a bank balance may be zero, what is the default value for a mobile phone number. You may need to distinguish between "customer has no mobile phone" and "customer's mobile number is not (yet) known" in which case a blank column won't do (and having an extra column to decide whether that column is one or the other is not a good idea).

Default values are simply what the DBMS will put in a column if you don't explicitly specify it.


It depends on the situation, but it's really ultimately simple. Which one is closer to the truth?

A lot of people deal with data as though it's just data, and truth doesn't matter. However, whenever you talk to the stakeholders in the data, you find that truth always matters. sometimes more, sometimes less, but it always matters.

A default value is useful when you may presume that if the user (or other data source) had provided a value, the value would have been the default. If this presumption does more harm then good, then NULL is better, even though dealing with NULL is a pain in SQL.

Note that there are three different ways default values can be implemented. First, in the application, before inserting new data. The database never sees the difference between a default value provided by the user or one provided by the app!

Second, by declaring a default value for the column, and leaving the data missing in an insert.

Third, by substituting the default value at retrieval time, whenever a NULL is detected. Only a few DBMS products permit this third mode to be declared in the database.

In an ideal world, data is never missing. If you are developing for the real world, required data will eventually be missing. Your applications can either do something that makes sense or something that doesn't make sense when that happens.


As with many things, there are good and bad points to each.

Good points about default values: they give you the ability to set a column to a known value if no other value is given. For example, when creating BOOLEAN columns I commonly give the column a default value (TRUE or FALSE, whatever is appropriate) and make the column NOT NULL. In this way I can be confident that the column will have a value, and it'll be set appropriate.

Bad points about default values: not everything has a default value.

Good things about NULLs: not everything has a known value at all times. For example, when creating a new row representing a person I may not have values for all the columns - let's say I know their name but not their birth date. It's not appropriate to put in a default value for the birth date - people don't like getting birthday cards on January 1st (if that's the default) if their birthday is actually July 22nd.

Bad things about NULLs: NULLs require careful handling. In most databases built on the relational model as commonly implemented NULLs are poison - the presence of a NULL in a calculation causes the result of the calculation to be NULL. NULLs used in comparisons can also cause unexpected results because any comparison with NULL returns UNKNOWN (which is neither TRUE nor FALSE). For example, consider the following PL/SQL script:

declare 
  nValue NUMBER;
begin
  IF nValue > 0 THEN
    dbms_output.put_line('nValue > 0');
  ELSE
    dbms_output.put_line('nValue <= 0');
  END IF;

  IF nValue <= 0 THEN
    dbms_output.put_line('nValue <= 0');
  ELSE
    dbms_output.put_line('nValue > 0');
  END IF;
end;

The output of the above is:

nValue <= 0
nValue > 0

This may be a little surprising. You have a NUMBER (nValue) which is both less than or equal to zero and greater than zero, at least according to this code. The reason this happens is that nValue is actually NULL, and all comparisons with NULL result in UNKNOWN instead of TRUE or FALSE. This can result in subtle bugs which are hard to figure out.

Share and enjoy.


To me, they are somewhat orthogonal.

Default values allow you to gracefully evolve your database schema (think adding columns) without having to modify client code. Plus, they save some typing, but relying on default values for this is IMO bad.

Nulls are just that: nulls. Missing value and a huge PITA when dealing with Three-Valued Logic.


In a Data Warehouse, you would always want to have default values rather than NULLs.

Instead you would have value such as "unknown","not ready","missing"

This allows INNER JOINs to be performed efficiently on the Fact and Dimension tables as 'everything always has a value'


Nulls and default values are different things used for different purposes. If you are trying to avoid using nulls by giving everything a default value, that is a poor practice as I will explain.

Null means we do not know what the value is or will be. For instance suppose you have an enddate field. You don't know when the process being recorded will end, so null is the only appropriate value; using a default value of some fake date way out in the future will cause as much trouble to program around as handling the nulls and is more likely in my experience to create a problem with incorrect results being returned.

Now there are times when we might know what the value should be if the person inserting the record does not. For instance, if you have a date inserted field, it is appropriate to have a default value of the current date and not expect the user to fill this in. You are likely to actually have better information that way for this field.

Sometimes, it's a judgement call and depends on the business rules you have to apply. Suppose you have a speaker honoraria field (Which is the amount a speaker would get paid). A default value of 0 could be dangerous as it it might mean that speakers are hired and we intend to pay them nothing. It is also possible that there may occasionally be speakers who are donating their time for a particular project (or who are employees of the company and thus not paid extra to speak) where zero is a correct value, so you can't use zero as the value to determine that you don't know how much this speaker is to be paid. In this case Null is the only appropriate value and the code should trigger an issue if someone tries to add the speaker to a conference. In a different situation, you may know already that the minimum any speaker will be paid is 3000 and that only speakers who have negotiated a different rate will have data entered in the honoraria field. In this case, it is appropriate to put in a default value of 3000. In another cases, different clients may have different minimums, so the default should be handled differently (usually through a lookup table that automatically populates the minimum honoraria value for that client on the data entry form.

So I feel the best rule is leave the value as null if you truly cannot know at the time the data is entered what the value of the field should be. Use a default value only it is has meaning all the time for that particular situation and use some other technique to fill in the value if it could be different under different circumstances.


I so appreciate all of this discussion. I am in the midst of building a data warehouse and am using the Kimball model rather strictly. There is one very vocal user, however, who hates surrogate keys and wants NULLs all over the place. I told him that it is OK to have NULLable columns for attributes of dimensions and for any dates or numbers that are used in calculations because default values there imply incorrect data. There are, I agree, advantages to allowing NULL in certain columns but it makes cubing a lot better and more reliable if there is a surrogate key for every foreign key to a dimension, even if that surrogate is -1 or 0 for a dummy record. SQL likes integers for joins and if there is a missing dimension value and a dummy is provided as a surrogate key, then you will get the same number of records using one dimension as you would cubing on another dimension. However, calculations have to be done correctly and you have to accommodate for NULL values in those. Birthday should be NULL so that age is not calculated, for example. I believe in good data governance and making these decisions with the users forces them to think about their data in more ways than ever.


As one responder already said, NULL is not a value.

Be very ware of anything proclaimed by anyone who speaks of "the NULL value" as if it were a value.

NULL is not equal to itself. x=y yields false if both x and y are NULL. x=y yields true if both x and y are the default value.

There are almost endless consequences to this seemingly very simple difference. And most of those consequences are booby traps that bite you real bad.


Two very good Access-oriented articles about Nulls by Allen Browne:

  • Nulls: Do I need them?
  • Common Errors with Null

Aspects of working with Nulls in VBA code:

  • Nothing? Empty? Missing? Null?

The articles are Access-oriented, but could be valuable to those using any database, particularly relative novices because of the conversational style of the writing.


Nulls NEVER save storage space in DB2 for OS/390 and z/OS. Every nullable column requires one additional byte of storage for the null indicator. So, a CHAR(10) column that is nullable will require 11 bytes of storage per row – 10 for the data and 1 for the null indicator. This is the case regardless of whether the column is set to null or not.

DB2 for Linux, Unix, and Windows has a compression option that allows columns set to null to save space. Using this option causes DB2 to eliminate the unused space from a row where columns are set to null. This option is not available on the mainframe, though.

REF: http://www.craigsmullins.com/bp7.htm

So, the best modeling practice for DB2 Z/OS is to use "NOT NULL WITH DEFAULT" as a standard for all columns. It's the same followed in some major shops I knew. Makes the life of programmers more easier not having to handle the Null Indicator and actually saves on storage by eliminating the need to use the extra byte for the NULL INDICATOR.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号