I am developing a social application on top of Java and Cassandra database. I need to store posts/comments of the user's shared posts, in database for which I am looking to serialize data per single comment/post & then store seriali开发者_如何转开发zed data in the database in one column. Thus for each comment, there'll be a single column that stores this data in serialized format:-
- Comment data(String around 700 characters max)
- CommentorId (long type)
- CommentTime (timestamp)
Similarly a posts' data will be serialized and stored as a single column.
Fast Deserialization would be required at each retrieval of that post by the frontend.
I am looking at protocol buffers as the probable solution for this. Would like to know whether choosing protocol buffers for this task is the right choice or not. I am looking for a high performance & fast serialization & deserialization algorithm that can serve for heavy usage in the application.
Also, is it possible to send the data in serialized format, to client and then there could it be deserialized ? server to client communication?
protocol buffers certainly provides serialization, although the RPC side of things is left to your imagination (often something simple and socket-based works very well).
The data-types are all well supported by protobuf (although you might want to use something like ms into the unix epoch for the date). Note though that protobuf doesn't include compression (unless you also apply gzip etc to the stream). So the message will be "a bit longer than the string (which always uses UTF-8 encoding in protobuf). I say a "a bit" because the varint algorithm for integer types could give anything between 1 and 10 bytes each for the id and timestamp, depending on their magnitude. And a few (3, probably) bytes for the field headers.
If that sounds about right, then it should work fine. If you have lots of text data, though, you might want to run the protobuf stream through gzip as well. Java has excellent support within protobuf via the main google trunk.
Don't know if that fits in your specific case but I have seen suggestions to store a JSON representation of the data that can be directly sent to the browser. If you don't need any further processing steps involving POJOs then this or a similar approach might be a (fast) way to go.
Overall, it looks like Protocol Buffers is a good fit for what you want to do. Many people use it exactly for what you've described. I heard about some others using plain JSON for that, but it is definitely less efficient.
Protocol Buffers is fast, portable, mature and well-documented. It is developed and maintained by Google. One of the distinctive features of Protocol Buffers is the ability to transparently extend existing records with new fields. For instance, you can extend your existing record format to contain some other fields without converting your existing data or modifying software that works with old fields (as it will silently discard unknown fields).
Regarding your question about whether client can work with serialized format (if I understood the question correctly). If a client supports Protocol Buffers and have the ".proto" files describing data format, then they will be able to work with it just like you do. If a client can't work with Protocol Buffers, there are some third-party libraries [1] that can convert between Protobuf, JSON and XML formats (I haven't tried using them myself).
You might also want to check out some alternatives to Protocol Buffers, such as Message Pack [2] and Avro. They claim to be faster / more compact / have support for dynamic typing.
[1] for example, http://code.google.com/p/protobuf-java-format/
[2] http://msgpack.org/
精彩评论