Is it smart to keep arrays in table columns? More precisely I am thinking of the following schema which to my understanding violates normalization:
create table Permissions(
GroupID int not null default(-1),
CategoryID int not null default(-1),
Permissions varchar(max) not null default(''),
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);
and this:
create table Permissions(
GroupID int not null default(-1),
CategoryID int not null default(-1),
PermissionID int not null default(-1),
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);
UPD3: I envision Permissions as a comma-delimited string since MSSQL is our primary deployment target.
UPD: Forgot to mention, in the scope of this concrete question we will consider that the "fetch rows that have permission X" won't be performed, instead all the lookups will be made by GroupID and CategoryID only
UPD2: I envision the typical usage scenario as following:
int category_id=42;
int[] array_of_g开发者_StackOverflow中文版roups=new int[]{40,2,42};
if(!Permissions.Check(category_id, array_of_groups, Permission.EatAndDrink)) {
throw new StarveToDeathException();
}
Thoughts?
Thanks in advance!
I'd suggest to take the normalized road for the following reasons:
- By having a table containing all possible permissions, you have self-documenting data. You may add a description to each permission. This definitely beats concatenated id values without any meaning.
- You get all the advantages of referential integrity and can be sure that there are no bogus permission ids in your data.
- Inserting and deleting permissions will be easier - you add or delete records. With the concatenated string you will be updating a column, and delete the record only when you remove the last permission.
- Your design is future-proof - you say you only want to query by CategoryID and GroupID, you can do this already with normalized tables. On top of that, you will also for example be able to add other properties to your permissions, query by permission, etc.
- Performance-wise, I think it will actually be faster to get a resultset of id's than having to parse a string to integers. To be measured with actual data and implementation...
Your second example should probably be:
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID,PermissionID)
Your first example would violate normal form (and string parsing might not be a good use of your processing time), but that doesn't mean it's necessarily wrong for your application. It really depends how you use the data.
Is it smart
Occasionally, it depends. I'd say it depends how narrowly you define the things being normalised.
If you can see no way in which a table with one row for each item would ever be useful then I'd suggest that the encapsulate-in-a-string might be considered.
In the example given, I'd want to be sure that executing a query to find all group/category combinations for a specified permission would not cause me a problem if I had to write a WHERE clause that used string pattern matching. Of course, if I never have to perform such a query then it's a moot point.
In general I'm happiest with this approach when the data being assembled thus has no significance in isolation: the data only makes sense when considered as a complete set. If there's a little more structure, say a list of data/value pairs, then formatting with XML or JSON can be useful.
If you're only querying by GroupID and/or CategoryID then there's nothing wrong with it. Normalizing would mean more tables, rows, and joins. So for large databases this can have a negative performance impact.
If you're absolutely certain you'll never need a query which processes Permissions, and it's only parsed by your application, there's nothing improper about this solution. It could also be preferable if you always want the complete set of permissions (i.e. you're not querying just to get part of the string, but always want all of its values).
The problem with the first implementation is that it doesn't actually use an array but a concatenated string.
This means that you won't easily be able to use the value stored in that string to perform set based queries such as finding all people with a specific permission or specific set of permissions.
If you were using a database that natively supported arrays as an atomic value such PostgreSQL then the argument would be different.
Based upon the second requirement of the proposed query I'd have to suggest the second one is best as you can simply query SELECT count(*) FROM Permissions WHERE CategoryID = 42 AND GroupID IN (40, 2, 42) AND PermissionID = 2
(assuming EatAndDrink has an ID of 2). The first version however would require retrieving all the permissions for each group and parsing the string before you can test if it includes the requested permission.
精彩评论