Derby - How to handle additions to objective fields_问答_开发者

I'm looking to create a table for users and tracking their objectives. The objectives themselves would be on the order of 100s, if not 1000s, and would be maintained in their own table, but it wouldn't know who completed them - it would only define what objectives are available.

Objective:
ID  |  Name   |  Notes  |
----+---------+---------+
    |         |         |

Now, in the Java environment, the users will have a java.util.BitSet for the objectives. So I can go

/* in class User */
boolean hasCompletedObjective(int objectiveNum) {
    if(objectiveNum < 0 || objectivenum > objectives.length())
        throw new IllegalArgumentException("Objective " + objectiveNum + " is invalid. Use a constant from class Objective.");
    return objectives.get(objectivenum);
}

I know internally, the BitSet uses a long[] to do its storage. What would be the best way to represent this in my Derby database? I'd prefer to keep it in columns on the AppUser table if at all possible, because they really are elements of the user.

Derby does not support arrays (to my knowledge) and while I'm not sure the column limit, something seems wrong with having 1000 columns, espeically since I know I will not be qu开发者_运维问答erying the database with things like

SELECT *
FROM AppUser
WHERE AppUser.ObjectiveXYZ

What are my options, both for storing it, and marshaling it into the BitSet? Are there viable alternatives to java.util.BitSet? Is there a flaw in the general approach? I'm open to ideas!

Thanks!

*EDIT: If at all possible, I would like the ability to add more objectives with only a data modification, not a table modification. But again, I'm open to ideas!

[puts on fake moustache]

Store the bitset as a BLOB. Start by simply serializing it, then if you want more space-efficiency, trying pushing the results through a DeflaterOutputStream on their way to the database. For better space- and time- efficiency, try the bitmap compression method used in FastBit, which breaks the bitset into 31-bit chunks, then run-length encodes all-zero chunks, packing the literal and run chunks into 32-bit words along with a discriminator bit.

If you know you'll only look at the objective bitset while the ResultSet that brought it from the database is still open, write a new bitset class that wraps the Blob interface and implements get on top of getBytes. This avoids having to read the whole BLOB into memory to check a few specific bits, and at least avoids having to allocate a separate buffer for the bitset if you do want to look at all the values. Note that making this work with a compressed bitset will take substantial ingenuity.

Be aware that this approach gives you no referential integrity, and no ability to query on the user-objective relationship, little flexibility for different uses of the data in future, and is exactly the kind of thing that Don Knuth warned you about.

The orthodox way to do this does not involve bitsets at all. You have a table for users, a table for objectives, and a join table, indicating which objectives a user has. Something like:

create table users (
    id integer primary key,
    name varchar(100) not null
);

create table objectives (
    id integer primary key,
    name varchar(100) not null
);

create table user_objective (
    user_id integer not null references users,
    objective_id integer not null references objectives,
    primary key (user_id, objective_id)
);

Whenever a user has an objective, you put a row in the join table indicating the fact.

If you want to get the results into a bitset for a user, do an outer join of the user onto the objectives table via the join table, such that you get a row back for every objective, which has a single column with, say, a 1 for each joined objective, or 0 if there was no join.

The orthodox approach would also be to use a Set on the Java side, rather than a bitset. That maps very nicely onto the join table. Have you considered doing it this way?

If you're worried about memory consumption, a set will use about one pointer per objective a user actually has; the bitset will use a bit per possible objective. Most JVMs have 32-bit pointers (only old or huge-heaped 64-bit JVMs have 64-bit pointers), so if each user has on average less than 1/32nd of the possible objectives, the set will use less memory. There are some groovy data structures which will be able to store this information more compactly than either of those structures, but let's leave that to another question.