Advantages to Vertical Partitioning of Table_问答_开发者

(Note that this situation isn't exactly how it is, but I made this as an example)

I have an entity in a table with data that is updated every 5 seconds (Kinematic Data: Speed, Heading, Lat, Long, and PositionTime), and other data that is updated hardly at all, if ever (Color, Make, OriginTime).

alt text http://www.freeimagehosting.net/uploads/a67205e99e.jpg

Now my boss wants me to partition this data into separate tables in our database (With a One to One Relationship), as so:

alt text http://www.freeimagehosting.net/uploads/1c699bc3c5.jpg

He makes it sound "obvious" that it should be this way, but is there really any advantages to having this data separated as so for inserting and up开发者_StackOverflowdating (For instance if I put an index on Color or Make)?

It might make sense to do vertical partitioning like this. Or it might not.

When you use a MVCC based engine, each time you update a row, it generally* copies the entire row and creates a new one with the modifications. This is so that other transactions which do not yet see the update can continue to read the original row if they need to.

This means that updating a few small columns frequently in a very wide row causes the database to do a lot more writes than it needs to.

But not that many, because generally the engine will only sync its transaction log, which will be the same size regardless of the size of non-updated columns, and also because the data rows are usually stored in blocks where a whole block needs to be written anyway, regardless of how much of it changed.

So it sounds like a potentially pointless optimisation, which like any other, should be considered on the grounds of a) IS there really a performance problem (i.e. is ANY optimisation needed) and b) Is this particular optimisation the best way of fixing it?

I think the chance of a) is unlikely, and b) is also unlikely, so the chances of this being required is approximately unlikely-squared.

* Some engines make an exception for very large columns such as big BLOBs or text columns, which are held elsewhere and not copied if other columns in the row are updated.

If the point of this design is to maintain a history of the kinematic data then the design makes sense. Although there doesn't seem to be a key in the CAR_KINEMATIC table which fits that usage. If on the other there is a one-to-one relationship between these two tables the division is meritless.

I'm not sure that the question is fully clear. If you want to main a history of the kinematics then the appropriate structure would be to normalise the data into the car data and the heading data. The car data can be updated independently and would probably be much smaller overall than the kinematics data.

If you want to maintain a flat record with the current state of the car, rather than maintaining a history, then leaving the data as it is is likely to be faster. The reason for this is that writing the whole record will probably entail just a single write operation in the majority of cases. Splitting it out into two tables guarantees that there will be at least two write operations.

In the first case you are just normalising the data; in the second case the current data structure is probably the most efficient.

Vertical partitioning is actually not that commonly used (except when it is, see below). Some scenarios where you might want to use vertical partitioning are:

The table is very wide and only some of it is used frequently. For example, if you have a table with 250 columns with 5 getting frequent state changes updated and a small subset of columns being used frequently by the application.
For security reasons, you may have a mix of confidential and not-so-sensitive data that lives in a 1:1 relationship. You can move the confidential data into another table with a different set of permissions. Historically, not all DBMS platforms allowed you to set permissions at column level.
A combination of the previous two, where changes to certain fields must be logged to an audit table, but other fields are updated very frequently without the requirement for logging. In order to avoid generating lots of spurious audit logging data the auditable fields can live in their own table with audit logging triggers.

Finally, you do get vertical partitioning behind the scenes in certain circumstances (i.e. it is not explicit in the schema but the physical storage works in this way). For example, many DBMS platforms store LOBs separately from normal table data, which results in a sort of implicit vertical partitioning of the table.

In fact, this particular situation makes tables with LOB columns quite expensive to do operations on, so moving the LOB column off into a separate table might well be a good application for vertical partitioning.

There are not many uses for vertical partitioning, and it always adds the overhead of additional I/O. You need to be avoiding a large overhead or have specific reasons such as security concerns for there to be much point in using it.

Your boss is right. And this has nothing to do with "partitioning", its called normalization.

Read this article.

EDIT: Ok, "vertical partitioning" is a well-known term, and normalization is one method of vertical partitioning. But in this case, normalization seems to be the right answer, which explains the question (Quote: "... is there really any advantages to having this data separated as so for inserting and updating"). Advantages and disadvantages of normalization are very well-known. The wikipedia article is a good starting point.

And BTW, to keep the flames of "Erwin Smout" burning: "vertical decomposition" does not seem to be a commonly used term here. Right?