Here is my problem:
Where I work, I need to run reports on a very large database. I'm using basic SQL queries in PHP to filter for exactly what I want. Every so often I'll get groups in my reports that do not meet the criteria of my search query.
I noticed that all of the "stragglers" that appear in my reports seemed to have NULL values in certain fields, consistently.
Looking at the database structure, I find that these NULL values in the database have a NOT NULL flag set on them.
It turns out that these are actually not NULL values and instead empty s开发者_高级运维tring values or $value = ''
In my past experience, when I was starting out I would make this mistake a lot, setting something = '' instead of making it NULL.
I've asked the DBA if he can think of any legitimate reason why these are set this way ( there are 300,000 some odd records that are like this ) and he had no clue.
I'm thinking it could be another programmers mistake or someone attempting to avoid not being able to insert the record due to the "NOT NULL" flag set on that specific field.
So I'm scratching my head over here trying to find a legitimate reason for these to exist, other than my own suspicions that the database was not designed for these 2 types of records: records with this value and records without this value.
What are your thoughts?
I've seen this pretty often, especially in cases where:
- The database is touched by a variety of clients (a primary application, ad-hoc queries, automated import interfaces...)
- Data has been converted or migrated from one system to another
There may not be a legitimate reason for it now, but it may have seemed like a good idea to someone else at the time. When you don't know the history of a database, it's hard to know the constraints or philosophies behind structural decisions.
I tend to be very careful around empty strings and nulls because I've been bitten by the same issues. There are often compatibility reasons that prevent changing the DB structure, so you're left with workarounds like using ISNULL(c1,'')
or COALESCE(c1,'')
to capture both nulls and empty strings.
What is the benefit of an empty string value vs. using a null value
There are no real problems with using one or the other, exclusively. Issues arise when a single column can contain either, mainly with sorting and searching:
Sorting
SELECT '' as c1
UNION
SELECT NULL as c1
UNION
SELECT 'a' as c1
ORDER BY c1
You'll see that NULL
s will be listed first, then string values (empty first). It's something to keep in mind if you're doing multi-column ORDER BY
s and you're wondering why some columns are always on top.
Searching
When a column is NULL
, you need to search by IS NULL
or IS NOT NULL
rather than = ''
. Also, when you do LENGTH( c1 ) = 0
, these NULL
records won't be included while empty strings will.
Finally, when you're comparing the column to itself in a self join, you won't be able to do a simple =
.
All in all, it would be best to conform the column values to one or the other and make your SQL consistent.
The use of NULL as a value in database has its critics, I'd like to refer to the Wikipedia article on NULL for an explanation of some of the oddities NULL may introduce in your queries.
So probably the database architect wasn't a fan of NULLs, maybe after reading "the Third Manifesto". This PDF (by one of the authors of the Third Manifesto) can help you understand the problem and show you how to manage the special case of missing values can be handled.
In my opinion, the empty string represents the absence of value, while NULL
represents an unknown value.
So, in my opinion, ""
is more specific, than NULL
. For example, if a person's birth place is not known, it is ok to store it as NULL
, but not ok to store it as ""
. If a person does not have any middle name, and it is known that the person does not have it, then it is better to store ""
than NULL
.
Reason : NULL
is the only way to indicate that a value is not known that works for all data types. The empty string ""
is more like 0
for integers.
精彩评论