svn and git versioning models difference_问答_开发者

I would like to know what is the difference between versioning approaches suggested by git (or other DVCSs) and subversion (or other CVCSs).

Here is what I found on http://www.xsteve.at/prg/vc_svn/svn.txt r开发者_StackOverflow社区egarding this topic:

Subversion mananges versioned trees as first order objects (the repository is an array of trees), and the changesets are things that are derived (by comparing adjacent trees.) Systems like Arch or Bitkeeper are built the other way around: they're designed to manage changesets as first order objects (the repository is a bag of patches), and trees are derived by composing sets of patches together.

But it's not clear how subversion repository stores changes, whether it contain oldest variant of versioned file and so on. Why couldn't we generate a bunch of patches as in case of git, for example? It's always mentioned as a principal difference between svn and git which simplifies/complexifies merges, but, unfortunately, I still do not get the idea.

There's a nice explanation about the main differences between VCS based on changesets and on snapshots at Martin's blog. I'll not repeat it here.

However, I would stress one point that may not be obvious at first. Changeset based VCSs make it really easy to track merges, which is much more difficult for systems like Subversion, which is based on snapshots.

In a changeset based VCS, merges are simply changesets (or commits, as they're called in git) which have more than one parent changeset. The graphical representation of the repository usually shows a DAG (Directed acyclic graph) where the nodes represent changesets and the arrows represent parent-child relationships. When you see a node with more than one parent you know exactly what kind of merge occurred there.

In Subversion, "merge tracking" is something new. Up until version 1.4 there was no such concept, so that in order to know about the history of merges you had to make notes in the log messages of your commits. Version 1.5 implemented merge tracking to make it easier to perform repeated merges from one branch to another without forcing the user to be explicit about revision ranges and the like. This is implemented with a property (svn:mergeinfo) associated with the directory receiving the merge. It tracks which revisions have been already merged from which branches. This is enough to infer which revisions should be merged in subsequente merges. But it doesn't make it easy to draw graphs showing the merge history, which is something you would like to see frequently as you work in a complex project with several developers.

Git is arranged with version trees as first-order objects in principle. That is, you deal with a graph of commit objects, each of which has a one-to-one relationship with a tree that is the state at that revision.

Note that how these are actually stored can be very different. Git started out simply compressing each file and tree/commit object individually. As I understand it, packing objects into a single file and storing just deltas for some objects was added much later.

So in fact, although patches seem to be ubiquitous in git user interfaces, they are in fact no relation to how the data is stored- the deltas that are stored in the pack files are binary-level deltas, not text-style diffs at all. Git will apply deltas to get objects and then diff them again to produce the patch on demand. This is in contrast to, for instance, CVS which inherited a latest-version-plus-reverse-deltas storage system from RCS.

Based on what you quoted, it appears that Git and SVN are actually more similar than either is to CVS, for example.

Late and partial answer. I didn't think the following had been clarified above:

Important terms:

CVCS = Centralized Version Control System
DVCS = Distributed Version Control System (used by Git)

REPOSITORY = A project's file tree, i.e. a directory with one or more subdirectories, with all of the many files for a single project. For example:

./Project1/README
./Project1/myprogram.c
./Project1/Makefile
./Project1/images/1.gif
./Project1/images/2.gif

Centralized:

One (centralized) Repository shared by everyone.

Usage:

A user checks out a file they want to edit, (i.e. gets a copy of that file from the remote repository),
They edit the file locally on their own computer, and then
They check the file back into the central repository, (i.e. copy it back to the central repository which records the changes and makes the changes now available to other users).

Permission to make changes is granted to all users.

Distributed:

One read only Repository shared by everyone, then at a minimum a full copy of that Repository at each user's location.

In other words every user makes a copy of the entire project tree onto their local machine, or copies the entire file tree from the primary repository.

Usage:

After a user makes a local edit
They can then submit the edit back to the central Repository to have it possibly included and thus shared with others.

Permission to make changes is controlled by the project owner who controls the primary repository. (In git we have a "pull request", or a request to the project owner who controls the central Repository, to pull in the new changes.)

I've oversimplified this, to focus on the primary differences between centralized and distributed. (Now I admit that I'm still learning how the changes are actually recorded that you had asked about, and hope to update this once I fully understand this.)

Ref: This is a good more complete article.