What is the preferred way to store different versions of data?_问答_开发者

When you're writing an application that needs to read and work with two versions of data in the same way, what is the best way to structure your classes to represent that data. I have come up with three scenarios:

Common Base/Specific Children
Data Union
Distinct Structures

Version 1 Car Example

byte DoorCount
int Color
byte HasMoonroof
byte HasSpoiler
float EngineSize
byte CylinderCount

Version 2 Car

byte DoorCount
int Color
enum:int MoonRoofType
enum:int TrunkAccessories
enum:int EngineType

Common Base/Specific Children

With this method, there is a base class of common fields between the two versions of data and a child class for each version of the data.

class Car {
    byte DoorCount;
    int Color;
}

class CarVersion1 : Car {
    byte HasMoonroof;
    byte HasSpoiler;
    float EngineSize;
    byte CylinderCount;
}

class CarVersion2 : Car {
    int MoonRoofType;
    int TrunkAccessories;
    int EngineType;
}

Strengths

OOP Paradigm

Weaknesses

Existing child classes will have to change if a new version is released that removes a common field
Data for one conceptual unit is split between two definitions not because of any division meaningful to itself.

Data Union

Here, a Car is defined as the union of the fields of Car across all versions of the data.

class Car {
    CarVersion version;
    byte DoorCount;
    int Color;
    int MoonRoofType;     //boolean if Version 1
    int TrunkAccessories; //boolean if Version 1
    int EngineType;       //CylinderCount if Version 1
    float EngineSize;     //Not used if Version2
}

Strengths

Um... Everything is in one place.

Weaknesses

Forced case driven code.
Difficult to maintain开发者_如何学Python when another version is release or legacy is removed.
Difficult to conceptualize. The meanings of the fields changed based on the version.

Distinct Structures

Here the structures have no OOP relationship to each other. However, interfaces may be implemented by both classes if/when the code expects to treat them in the same fashion.

class CarVersion1 {
    byte DoorCount;
    int Color;
    byte HasMoonroof;
    byte HasSpoiler;
    float EngineSize;
    byte CylinderCount;
}

class CarVersion2 {
    byte DoorCount;
    int Color;
    int MoonRoofType;
    int TrunkAccessories;
    int EngineType;
}

Strengths

Straightforward approach
Easy to maintain if a new version is added or legacy is removed.

Weaknesses

It's an anti-pattern.

Is there a better way that I didn't think of? It's probably obvious that I favor the last methodology, but is the first one better?

Why is the third option, distinct structures for each version, a bad idea or anti-pattern?

If the two versions of data structures are used in a common application/module - they will have to implement the same interface. Period. It is definitely untenable to write two different application modules to handle two different versions of data structure. The fact that the underlying data model is extremely different should be irrelevant. After all, the goal of writing objects is to achieve a practical level of encapsulation.

As you continue writing code in this way, you should eventually find places where the code in both classes are similar or redundant. If you move these common pieces of code out of the various version classes, you may eventually end up with version classes that not only implement the same interface, but can also implement the same base/abstract class. Voila, you've found your way to your "first" option.

I think this is the best path in an environment with constantly evolving data. It requires some diligence and "looking behind" on older code, but worth the benefits of code clarity and reusable components.

Another thought: in your example, the base class is "Car". In my opinion, it hardly ever turns out that the base class is so "near" to it's inheritors. A more realistic set of base classes or interfaces might be "Versionable", "Upgradeable", "OptionContainer", etc. Just speaking from my experience, YMMV.

use the second approach and enhance it with the interfaces. remember that you can implement multiple interfaces "versions" which gives you the power of backward compatibility! i hope that you'll get what i meant to say ;)

Going on the following requirement:

an application that needs to read and work with two versions of data in the same way

I would say that the most important thing is that you funnel all logic through a data abstraction layer, so that none of your logic will have to care about whether you're using version 1, 2 or n of the data.

One way to do this is to have just one data class, that is the most "buffed up" version of the data. Basically, it would have MoonRoofType, but not HasMoonRoof since that can be inferred. This class should not have any obsolete properties either, since it's up to the data abstraction layer to decide what the default values should be.

In the end, you'll have an application that doesn't care about the data versions at all.

As for the data abstraction layer, you may or may not want to have data classes for every version. Most likely, all you'll need is one class for every version of the data structure with Save and Load methods for storing/creating the data instances used by your application logic.