naming the id field of a table properly_问答_开发者

I am currently reading the book "SQL Prog开发者_Python百科ramming Style" wrote by Joe Celko.

In the first chapter, at the paragraph "Develop Standardized Postfixes" he states for the id column :

"_id" = identifier. It is unique in the schema and refers to one entity anywhere it appears in the schema. Never user ">table_name<_id"

Few pages later he states

Do not use an underscore as the first or last letter in a name. It looks like the name is missing another component.

He deprecated "id" as column name.

So I would like to know how you guys name the id column ?

I know that most people might think what the point of this question, but I am looking on standardizing my data model, following industry standards and ISO standards as much as I can.

I also deprecate the use of "Id" as a column name, even though it has become very widespread. "EmployeeId" is longer than "Id", but it is more descriptive. It also allows a foreign key to generally have the same name as the column to which it refers. This is enormously helpful when control over the database passes from one person to the next.

There is an exception to the above. It's possible to have two foreign keys in the same table that both refer to the same key. It's also possible to have a reflexive foreign key that refers to the key in a different row of the same table where it appears.

Let me give an example of a reflexive key. You have a table of employees, with key EmployeeId. You have another column, called SupervisorId, that records the relationship between a supervisor and several subordintes. The name of the foreign key in this case names the role, and not the entity.

As an alternative, it's possible to use user defined domains to document the fact that two columns refer to the same thing. Again, this is most useful when the fundamental meaning of the data has to be communicated to someone new.

The use of underscore as an internal visual separator inside a symbol is a completely separable issue. Camelcasing has become more widespread than underscore, and there are even systems where underscore is not allowed as a symbol constituent.

Above all, keep it consistent. If you use arbitrary, capricious, and contradictory naming conventions, you'll eventually confuse even your self.

I think it's a good question. Do what looks good to you, and always do that, every time. Then you'll be fine.

I use the tablename + 'id' model: UserId, PersonId etc.

Rather than share my opinions on naming standards, I'll attempt to answer your question ;)

I think the point Celko is making is that student_ID in a table of students is a code smell i.e. it could be that the designer's style is to always adds an ID column, probably an auto-increment column, to every table they create in the physical model (even when there is no such column in the logical model) with the intention of using these ID columns for foreign keys. In other words, Celko does not want you to always use surrogate key, rather he wants you to use natural keys where appropriate.

If you read on to section 1.2.5 (p14-15) and follow his rules for table names, you'll discover why table name + _ID an unlikely occurrence:

if I cannot find an industry standard (name), I would look for a collective or class name... Exception: use a singular name if the table actually has one and only one row in it.

So, for example, if you had a table containing student data it may be called Students rather than Student but more likely to be Enrolment (or similar). And a table containing one and only one row is unlikely to need an _ID column.

I suppose there are nouns for whom the plural is the same as the singular so maybe Sheep_ID is acceptable (but only in absence of an industry standard ovine identifier, of course!)

Also consider the rule 1.3.2. (p19) Avoid Names That Change From Place to Place e.g. the same domain referred to in the Students table as ID and in other tables as student_ID. It is unlikely that there will only be one element named _ID in the entire schema!

For Table IDs I always use tablename + ID.

The reason for this is to avoid ambiguous column names in Queries when it is a 1 to 1 mapping

Sometimes I quickly write up sql to test like this

Select
  * 
FROM table1
Inner join table2 on table1ID = table2ID

If I didnt use tablename in the ID column then this would throw an error (forcing me to use aliases on the tables)

Select
  * 
FROM table1
Inner join table2 on ID = ID

Also another good reason to use the table name, in general testing queries to see what data exists use the "*" to select columns. If you do a join and Select *, sometimes it is difficult to understand what ID came from what table, especially if you are returning a large number of columns from more than 2 tables

I always advocate for globally unique TABLENAME_ID. On that note, I strongly encourage table names which fully describe their context, so there is never any ambiguity as to their application when foreign references are made.

ID as a column name is hard to maintain and in my opinion can more easily lead to mistakes in joins.

Suppose for instance you always used ID as a column name in every table.

Now suppose you need to join to six of those tables. And being a typical person, you copy the first joins and change the table names. If you miss one, and you use id you will get a query that runs and gives the wrong anaswer. If you use tablenameId you will get a syntax error. see the following code for an example:

create table #test1 (id int identity, test varchar(10))
create table #test2 (id int identity, test varchar(10))
create table #test3 (id int identity, test varchar(10))

insert #test1
values ('hi')
insert #test1
values ('hello')
insert #test2
values ('hi there')
insert #test3
values ('hello')
insert #test3
values ('hi')
select * 
from #test1 t1
join #test2 t2
    on t1.id = t2.id
join #test3  t3
    on t1.id = t2.id    
select * 
from #test1 t1
join #test2 t2
    on t1.id = t2.id
join #test3  t3
    on t1.id = t3.id        

Drop table #test1
drop table #test2
drop table #test3   
Go

create table #test1 (t1id int identity, test varchar(10))
create table #test2 (t2id int identity, test varchar(10))
create table #test3 (t3id int identity, test varchar(10))   


    insert #test1
    values ('hi')
    insert #test1
    values ('hello')
    insert #test2
    values ('hi there')
    insert #test3
    values ('hello')
    insert #test3
    values ('hi')

select * 
from #test1 t1
join #test2 t2
    on t1.t1id = t2.t2id
join #test3 t3
    on t1.t1id = t3.t3id    

select * 
from #test1 t1
join #test2 t2
    on t1.t1id = t2.t2id
join #test3 t3
    on t1.t1id = t2.t3id    

    Drop table #test1
    drop table #test2
    drop table #test3

Another thing about using tablenameId is that when you want the actual id from several tables in a complex reporting query, you don't have to create aliases in order to see which id came from where (and to make the reporting application happy as most of them inist on unique fieldnames for a report).

Wow, I was going to write "I always use TablenameID but everyone else in the world disagrees with me". However, it looks like everyone here agrees with me.

That is, of course, when I use a surrogate integer ID in the table. If there's a natural primary key I use that instead.

In my database:

For a foreign key ID, I use the singular version of the foreign table name + "Id". I use the capital I, lower d as it is a standard ingrained in me by FX cop.

For auto incrementing identities I often use "SequenceId"

In my data layer:

I use the name of the object + "Id", following best practice standards for "Id"