开发者

Is it a good idea to store attributes in an integer column and perform bitwise operations to retrieve them?

开发者 https://www.devze.com 2023-01-10 18:58 出处:网络
In a recent CODE Magazine article, John Petersen shows how to use bitwise operators in TSQL in order to store a list of attributes in one column of a db table.

In a recent CODE Magazine article, John Petersen shows how to use bitwise operators in TSQL in order to store a list of attributes in one column of a db table.

Article here.

In his example he's using one integer column to hold how a customer wants to be contacted (email,phone,fax,mail). The query for pulling out customers that want to be contacted by email would look like this:

SELECT C.*   
FROM   dbo.Customers C  
       ,(SELECT 1   AS donotcontact   
               ,2  AS email   
               ,4  AS phone   
               ,8  AS fax   
               ,16 AS mail) AS contacttypes   
WHERE  ( C.contactmethods & contacttypes.email <> 0 )
AND    ( C.contactmethods & contacttypes.donotcontact = 0 )   

Afterwards he shows how to encapsulate this in to a table function.

My questions are these:

1. Is this a good idea? Any drawbacks? What problems might I run in to using this approach of storing attributes versus storing them in two extra tables (Customer_ContactType, ContactType) and doing a join with the Customer table? I guess one problem might be if my attribute list gets too long. If the column is an integer then my attribute list could only be at most 32.

2. What is the performance of doing these bitwise operations in queries as you move in to the tens of thousands of reco开发者_如何学编程rds? I'm guessing that it would not be any more expensive than any other comparison operation.


If you wish to filter your query based on the value of any of those bit values, then yes this is a very bad idea, and is likely to cause performance problems.

Besides, there simply isn't any need - just use the bit data type.

The reason why using bitwise operators in this way is a bad idea is that SQL server maintains statistics on various columns in order to improve query performance - for example if you have an email column, SQL server can tell you roughly what percentage of values that email column are true and select an appropriate execution plan based on that knowledge.

If however you have flags column, SQL server will have absolutely no idea how many records in a table match flags & 2 (email) - it doesn't maintain these sorts of indexes. Without this sort of information available to it SQL server is far more likely to choose a poor execution plan.


And don't forget the maintenance problems using this technique would cause. As it is not standard, all new devs will probably be confused by the code and not know how to adjust it properly. Errors will abound and be hard to find. It is also hard to do reporting type queries from. This sort of trick stuff is almost never a good idea from a maintenance perspective. It might look cool and elegant, but all it really is - is clunky and hard to work with over time.


One major performance implication is that there will not be a lookup operator for indexes that works in this way. If you said WHERE contact_email=1 there might be an index on that column and the query would use it; if you said WHERE (contact_flags & 1)=1 then it wouldn't.

** One column stores one piece of information only - it's the database way. **

(Didnt see - Kragen's answer also states this point, way before mine)


In opposite order: The best way to know what your performance is going to be is to profile.

This is, most definately, an "It Depends" question. I personally would never store such things as integers. For one thing, as you mention, there's the conversion factor. For another, at some point you or some other DBA, or someone is going to have to type:

Select CustomerName, CustomerAddress, ContactMethods, [etc]
From Customer
Where CustomerId = xxxxx

because some data has become corrupt, or because someone entered the wrong data, or something. Having to do a join and/or a function call just to get at that basic information is way more trouble than it's worth, IMO.

Others, however, will probably point to the diversity of your options, or the ability to store multiple value types (email, vs phone, vs fax, whatever) all in the same column, or some other advantage to this approach. So you would really need to look at the problem you're attempting to solve and determine which approach is the best fit.

0

精彩评论

暂无评论...
验证码 换一张
取 消