开发者

Data structure to work well with versioned string data?

开发者 https://www.devze.com 2023-03-10 22:40 出处:网络
What are \"versioned data structures\" called correctly in CS? Is there some list of existing \"versioned data structures\" somewhere that I could go through, compare and choose from on my own? I ha

What are "versioned data structures" called correctly in CS?

Is there some list of existing "versioned data structures" somewhere that I could go through, compare and choose from on my own? I have found only one paper for "Mul开发者_Python百科tiple Version Hash Table" (title Hash Tables in Logic Programming, from circa 1987) and that's all I could find.

Sample data (requested in discussion below):

"a b" c "d e" {
    f {
        "g h" "i j" {
            "k l m" n "o p q" {
                r;
            }
        }
    }
}
"a b" x {
     y {
         "z z z";
     }
}
"foo bar" baz;
"foo baz" bar;

If multiple similar "trees" occur, then they will stay in place, like:

foo {
    bar;
    oof;
    bar {
       "a b c";
    }
    bar;
}

will never get sorted but always stay in the same order as written.

Data will be "compiled" into binary form by some "commit/version" tool and "rolled back" back to text form or queried by another tools.

To clarify more, let's see some usage examples in Unix shell:

sh> $EDITOR file.txt
File saved.

sh> commit file.txt                      # create/commit file.txt.db (we assume 9 commits before this one)
Commited as version 10.

sh> show file.txt.db 'a b'               # Trailing '*' is implicit
"a b" c "d e" {
    f {
        "g h" "i j" {
            "k l m" n "o p q" {
                r;
            }
        }
    }
}
"a b" x {
     y {
         "z z z";
     }
}

sh> show file.txt.db 'a b' '*' y
"a b" x {
     y {
         "z z z";
     }
}

sh> show file.txt.db 'foo *'
"foo bar" baz;
"foo baz" bar;

sh> show file.txt.db -rollback=2 'a b'  # "historical" query, say that 'a b" c "d e" ...' was not there the time
"a b" x {
     y {
         "z z z";
     }
}

sh> rollback file.txt.db 2
Rolled back to version 2.

sh> $EDITOR file.txt
File saved.

sh> commit file.txt                      # new "branch" commit (because of the rollback above)!
Commited as version 2.1.

Note: ALL history is always saved! Nothing is going to be erased (if not explicitely requested by some other tool or so).

Etc. Perhaps some "log" or "journal" needed??


If you are dealing with purely text-based data, a basic versioned data structure could be as simple as an array or linked list of structures looking something like:

struct versioned_string {
    int   version;
    char* data;
};

"Committing" a new entry would only involve allocating a new string to store the data in. Retrieving a specific version would involve walking the list looking for a particular version value (for a linked list) or indexing into the array (for an array). A "rollback" would involve clearing out an entry from the end of the list and freeing the associated string.

You could also use a database (sqlite, for instance), store the information in a table, and let the DB engine handle the adds, deletes, and searches. This might be overkill, however.

I don't really follow your example, though. I'm assuming the first code block you posted indicates a single input string (that is, a snapshot of what the text data looks like at a particular version)? If so, multiple "trees" shouldn't be a problem since the input is stored as raw text and not processed or interpreted in any way. I certainly don't understand your "Unix shell" example. Can you break that down in more detail? Specifically, for each line, clearly list what the input is, what the output is, and what you expect the "current version" of the data to look like at that point.

Update: Ignore my previous comment about rollback; I was thinking of it in terms of how this website uses the term (meaning to revert to a previous state), and you use the term differently (meaning to retrieve an older version).

Each snapshot of the entire data set can be represented by a single C string since elements are separated by non-null characters (whitespace and/or curly braces). A simple data structure like the one I outlined above should give you the capabilities that you need if you create a linked list of them. Since the data is stored in it's string representation, commit and rollback operations would be as trivial as adding a new element to the list (for commit) or walking the list backwards and passing the string to printf (for rollback). The 'show' command is a different animal, because it doesn't really have much to do with the way you version the data. As far as the versioning system is concerned, all you would do is grab the last entry from the list. The real work there is done by whatever library functions you've created for parsing strings and manipulating data internally.

0

精彩评论

暂无评论...
验证码 换一张
取 消