Many-to-many relations in database design_问答_开发者

I'm building a database that makes use of lots of many-to-many relations. For example, I have part items and machine items where each part goes into many machines and each machine has many parts. The purpose of this database is to be able to input a list of parts and get back a list of machines composed of only parts in that list, or vice versa. I'm new to database design, so I've currently been modeling this relation with translation tables, like such:

create table machine ( machine_id number, machine_name char(30) )

create table part ( part_id number, part_name char(30) )

create table relations ( part_id number, machine_id number )

开发者_StackOverflow社区

This seems like a very ugly and naive way to go about modeling this sort of relation. Are there better ways of doing this, like somehow storing and updating lists in single columns under their respective tables or something better I haven't thought of yet? I'm trying to find a way to do this that minimizes query time and post-processing intensity.

Thanks.

Welcome to relational databases. No, there isn't a better way. I'd recommend naming your tables slightly better; Machine, Part, Machine_Part or something of the sort, because you're probably going to end up with a bunch of these type of tables.

Post processing isn't really an issue, it's fairly easy to deal with these relationships using simple INNER JOINs in SQL or using an ORM. Databases are designed to cope with this kind of stuff.

Edited to Improve Clarity

Actually there are better ways to design that in a true Relational Database (it has been done thousands of times).

The structure is called a Bill of Materials structure, and existed long before the Relational Model. Of course, I will give you the Relational version. This is high performance and has no processing restraints, eg. you can produce a tree structure (BoM) report using a simple stored proc which is called recursively.

A couple of things that needs to be understood before we go for the chase.

From your desc, the Machines are actually Assemblies (of Parts). These Assemblies are used in (higher level) Assemblies, and so on. So let's rename "relations" as Assembly.
Your Part and Machine tables have not been normalised. If you inspect them closely you will find many identical columns(PartId::MachineId, PartName::MachineName are shown). In fact except for the lowest Part-of-a-Machine and the highest Machine-containing-Parts, all intervening Machines (containing Parts) are actually also Parts in (higher level) Machines. So you have monstrous data duplication. That is a large Normaalisation error. That will kill performance. The resulting Update Anomalies will cause data integrity problems. both those issues will come to the fore long before your other concerns do.
.
So that must be corrected by the ordinary process of Normalisation: Part and Machine become one table: Part. The fact that a Part may be an Assembly (of Components) is based on the context of usage; the fact that a Part may be a Component (in Assemblies) is the likewise based on context. As a unit sitting on the shelf, that you must keep an inventory of, Parts and Machines are each just Parts. The inventory control columns are located in Part; the context columns are located in Assembly.
Many-to-many relations (at the logical they are relations, not tables) are implemented at the physical level as Associative tables, there is no way around that. You already have that. What you are about to see may look different, but it is not.

Anything you get from me will be 5NF; zero data duplication; zero Update Anomalies. This runs in production at more than a few manufacturing sites. It so happens that I use the required structure as part of the tutorial in my Advanced classes, so you can just look it up. Feel free to ask any questions, related to understanding or otherwise. It is in the tutorial precisely because many developers do not understand the structure or how to navigate it. Note the very tight (tried and tested) naming conventions.

Click on Part or Assembly (you can ignore the rest of the Model, but I am happy to answers questions about it as well). Both the methodology, and the diagram notation, is IDEF1X (it is a "strict" rendition of the RM).

Part-Assembly-Component Example

And yes, Data Integrity is maintained, and Assembly⇢Component searches are served 100% (allowing for covered queries), from the Assembly Primary Key index (that is to say, they are already highly optimised, I do not need to make them some vendors this or that to get more speed out of it). A second Unique Index serves Component⇢Assembly Searches 100%.

Enjoy.

This is not naive, this is the proper way of an ER model. Separating entities with relations, the classic design pattern. Worry not about query/join overhead, RDBMSs are optimized for this and can fly through those join queries.

You can also make the relation table have (part_id,machine_id) as a compound primary key. Better yet, create them as indexed organized tables and avoid any (negligible) overhead of the table data.

A clear way to think about it: logical vs. physical

A logical many-to-many relationship between two entities (call them A and B) has a physical implementation of three tables (call them A, B, and C). This implementation involves a physical one-to-many relationship from A to C. Another from B to C. Table C is known as a "join table" or "junction table"

Short answer: You're headed in the right direction. This is the textbook way to create many-to-many relationships.

However, a "machine" is probably just a special case of a "part", namely, a part that in some sense you consider "a complete thing" or a deliverable item.

Most people who do this create a hierarchical structure. You have:

part(part_id, description, ... maybe other data like size and weight, etc ...)
assembly(parent_part_id, child_part_id)

Then a top-level part can have many component parts, and each of these components can have other components, etc. The advantage of this is that when an assembly is used on more than one final product, you only have to describe the break-out once. For example, I used to work for a company that made dishwashers. In out parts table, we'd have a record for, say, Motor model 29B. That motor would have many component parts -- a casing, a rotor, some stators, electrical harness, etc. But the same motor might be used in several different models of dishwasher. You don't want to have to give the whole parts list for the motor all over again for every dishwasher that it is used on. Not only is that a pain for someone to type in, but if you ever make a change, you have to be sure to change it everywhere its used. So you just give the breakout for the motor once, and then reference the motor as a unit in all higher-level assemblies.

Yes, some parts are "final", that is, they don't go on a higher-level assembly. But you don't need a different table for this. That just means that for this particular part, there is no Assembly record where this is the child.

Parts is parts. Put them all in one table.

Some database engines, e.g. Oracle, have commands to chase a hierarchy like this. If yours doesn't, you'd have to do it in code. But it's not that tough. People do it all the time.

That's normalized data - the most scalable means of storing any combination of a part record to a machine record.