JMS message. Model to include data or pointers to data?_问答_开发者

I am trying to resolve a design difference of opinion where neither of us has experience with JMS.

We 开发者_JS百科want to use JMS to communicate between a j2ee application and the stand-alone application when a new event occurs. We would be using a single point-to-point queue. Both sides are Java-based. The question is whether to send the event data itself in the JMS message body or to send a pointer to the data so that the stand-alone program can retrieve it. Details below.

I have a j2ee application that supports data entry of new and updated persons and related events. The person records and associated events are written to an Oracle database. There are also stand-alone, separate programs that contribute new person and event records to the database. When a new event occurs through any of 5-10 different application functions, I need to notify remote systems through an outbound interface using an industry-specific standard messaging protocol. The outbound interface has been designed as a stand-alone application to support scalability through asynchronous operation and by moving it to a separate server.

The j2ee application currently has most of the data in memory at the time the event is entered. The data would consist of approximately 6 different objects; a person object and some with multiple instances for an average size in the range of 3000 to 20,000 bytes. Some special cases could be many times this amount.

From a performance and reliability perspective, should I model the JMS message to pass all the data needed to create the interface message, or model the JMS message to contain record keys for the data and have the stand-alone Java application retrieve the data to create the interface message?

I wouldn't just focus on performance for the decision, but also on other non-functional considerations.

I've been working on a system where we decided to not send the data in the message, but rather the PK of the data in database. Our approach was closer to the command message pattern. Our choice was motivated by the following reasons:

Data size: we would store the data in BLOB because it could bu hughe. In your case, the size of the data probably fit in a message anayway.
Message loss: we planned for the worse. If the messages were lost, we could recover the data and we had a recovery procedure to resubmit the messages. Looks maybe paranoid, but here are two scenario that could lead to some message being lost: (1) queue is purged by mistake (2) an error occurs and messages can't be delivered for a long time. They go to the dead message queue (DMQ) which eventually reaches its limit and start discarding messages, if not configured correctly.
Monitoring: different messages/command could update the same row in database. That was easy to monitor and troubleshoot.

Using a JMS + database did however complicates a bit the design:

distributed transactions: this adds some complexity, and sometimes some problems. Distributed transactions have subtle differences with "regular" transactions, such as distributed timeout.
persitency: the code is less intuitive. Data must first be persisted to have the PK, which leads to some complexity in the code if an ORM is used.

I guess both approaches can work. I've described what led us to not send the data in the message, but your system and requirements might be different, so it might still be easier to send the data in the message in your case. I can not provide a definitive answer, but I hope it helps you make your decision.

Send the data, not the pointer. I wouldn't consider your messages to be an extraordinary size that can't be handled.

It will be no problem for the queue to handle the data, the messages in the queue are persisted anyway (memory, file or database persistence whatever fits better for the size of your queue).

If you just put a handle to the data in the queue the application that process the queue will make unnecessary work to get the data that the sender already has.

Depending on your question I cannot say what's the best in your case. Sure there are performance implications because of the message size and stuff, but first you need to know which information needs to be sent to the remote system by your message consumer, especially in a system which may have concurring updates on the same data.

It is relevant whether you need to keep the information stored in the remote system in sync with the version of the record just stored in your database, and whether you want to propagate a complete history along to the remote system which is updated by the message reciever. As a lot of time may pass in between the message send and the processing on the other end of the queue.

Assume (for some reason) there are a whole lot of messages in the queue, and within a few seconds or minutes three or four update notifications on the same object hit the queue. Assume the first message is processed after the fourth update to the record was finished, and its update notification is put in the queue. When you only pass along the ID of the record, all four messages would perform exactly the same operation on the remote system, which for one is absolutely superfluous. In addition, the remote system sees four updates, all the same,but has no information of the three intermediating states of the object, thus, the history, if relevant, is lost for this system.

Beside these semantic implications, technical reasons for passing the id or the whole data are whether it's cheaper to unwrap the updated information from the message body or to load them from the database. This depends on how you want to serialize/deserialize the contents. The message sizes you provided should be no problem for decent JMS implementation when you want to send the data along.

When serializing java objects into messages you need to hold the class format in sync between sender and consumer, and you have to empty the queue before you can update to a newer version of the class on the consuming site. Of course the same counts for database updates when you just pass along the id.

When you just send the ID to the consumer you will have additional database connections, this might also be relevant depending on the load on the database and how complex the queries are you need to execute to get the objects.