Normalization is better or composite primary key is better?_问答_开发者

I have a table in Oracle DB,say, Student table. StudentID is the primary key in the table.I have another column interested subjects,say columns name is interested_SUB. A student can have more than one interested subject. In this case, I have the following 2 options:

1) Having the StudentID and Interested_SUB columns as the composite primary key. In this case, For example, If student interested in 3 subjects then I will have 3 rows in the table with (S1,SUB1) (S1,SUB2) and (S1,SUB3) as the column values and all other columns will have same values for these three rows.

2) Have a seperate table with columns StudentId and Interested_SUB and an additional column in first table to indicate whether student is interested in more than one subject. In this case, I will one row for each student in the student table with studentId and SUB as (S1,SUB1) and also the new indicator 开发者_如何学Pythoncolumn as "Y". In the second table (S1,SUB2) & (S1,SUB3).

Please suggest me which of the above options is increasing performance of the DB.

Thanks in Advance

A student table is likly to contain a lot of values about the student. How would that look like with option 1? E.g. would you like to see the name, age or semester in every row? Probably not.

Normally you have both, your student-table and your subject-table on its own. A third table contains the information to connect the two tables. There you can have multiple rows which belong to a single student, but to different subjects:

students:  
1, Mister X  
2, Mister Y

subjects:  
1, Computer science  
2, Mathematics

students_subjects:  
1, 1  // Mister X likes computer science  
1, 2  // Mister X likes mathematics, too  
2, 2  // Mister Y likes mathematics only

This will probably not be as performant as writing everything into one table. But you shouldn't think about performance too early and without reason.

"Performance" is pretty difficult to judge without having some metrics regarding what the production scenario is (e.g.: how many students? how many subjects, what is the expected percentage of students having more than one subject as interests?)

On the other hand your second solution is pretty bad in terms of design (it is counter-intuitive, relies on logic which is not immediately obvious by looking at the DB schema, it gets complicated in case someone wants to drop one of his interests...) and even in the fairly improbable case that it is more "efficient" the actual gains will be vastly overshadowed by the increase in complexity.

So, in a nutshell: forget solution #2.

In real databases, for large tables the simpler the key the better. It makes scans and joins much faster and consumes less RAM. An artificial numeric key may be faster and more scalable than non-numeric and/or composite one.

In your case, definitely go for normalization. Not only it will be faster (fewer rows), but also better at representing the domain and less fragile (no need to worry about keeping multiple rows for one student in sync).

Database performance-related questions can't really be answered without knowing a lot more about the situation:

How big is the table going to be?
Up to how many subjects can a student have? ("More than one" can mean five, or a hundred)
How many columns will get repeated?
What kinds of queries will you be running?
What indexes do you have on the tables?

And even that's just scratching the surface; you'd still need to test to be able to say anything definitively.

In general, normalized is the "cleaner" option, makes things simpler and easier all around; but de-normalizing can often speed things up. I'd go with normalized unless you absolutely need the extra performance.

What you describing is an intersection table (AKA junction or link) table. This is a common construct for representing many-to-many relationships. You have a STUDENTS table with general information about Students (Name, Date of Birth, etc) and a SUBJECTS table with general information about the Subjects (Name, Teacher, etc). You need a STUDENT_SUBJECTS table to show which Students are interested in which subjects.

As for keys, there are no hard and fast rules. Theory favours the composite natural key (STUDENT_ID, SUBJECT_ID). This would be my choice if there was no other columns or data associated with the table. However, it is not unreasonable to imagine that other data might depend on STUDENT_SUBJECTS - such as ASSIGNMENTS, TESTS, etc. In this case a synthetic primary key (STUDENT_SUBJECT_ID) is a lot more manageable when propagated as a foreign key. However, it is crucial to continue to enforce the natural key through a unique constraint.