I am designing a system that will allow users to take data from one system and send to other systems. One of the destination systems has a sophisticated SOA (web services) and the other is a mainframe that accepts flat files for input.
I have created a database that has a PublishEvent table and PublishEventType table. There are also normalized ta开发者_Python百科bles that are specific to the type of event being published.
I also have an "interface" table that is a flatened out version of the normalized data tables. The end user has a process that puts data into the interface table. I am not sure of the exact process - I think it's some kind of reporting application that they can export results to a SQL table. I then use an SSIS package to take the data out of the interface table and put it into the normalized data structure and create new rows in the PublishEvent table. I use the flat table because when I first showed them the relational tables they seemed to be very confused.
I have a windows service that watches for new rows in the PublishEvent table. The windows service is extended with plug-ins (using the MEF framework). Which plug-in is called depends on the value of the PublishEventTypeID field in the PublishEvent row.
PublishEventTypeID 1 calls the plug-in that reads data from one set of tables and calls the SOA Web service. PublishEventTypeID 2 calls the plug-in that reads data from a different set of tables and created the flat file to be sent to the mainframe.
This seems like I am implementing the "Database as IPC" anti-pattern. Should I change my design to use a messaging based system? Is the process of puting data into the flat table then into the normalized tables redundant?
EDIT: This is being developed in .NET 3.5
A MOM is probably the better solution but you also have to take in account the following points:
- Do you have a message based system already in place as part of your customer's architecture? If not, maybe introducing it is an overkill.
- Do you have any experience with Message-based systems? As an Jason Plank correctly mentioned, you have to take in account specific patterns for these, like having to ensure chronological order of messages, managing dead letter channels and so on (see this book for more).
- You mentioned a mainframe system which has apparently limited options for interfacing with. Who will take care of the layer that will transform "messages" (either DB or MOM based) into something that the mainframe can digest? Assuming it is you, would it be easier (for you) to do that by accessing the DB (maybe you have already worked on the problem in the past) or would the effort be different depending on using a DB or a MOM?
To sum it up: if you are more confident by going the DB route, maybe it's better to do that, even if - as you correctly suggested yourself, it is a bit of an "anti-pattern".
Some key items to keep in mind are:
Row order consistency - Does your data model depend on the order of the data generated? If so, does your scheme ensure the pub and sub activity in the same order original data is created?
Do you have identity columns on either side? They are a problem since their value keeps changing based on the order the data is inserted. If Identity column is the sole primary key (surrogate key), a change in its value may make the data unusable.
How do you prove that you have not lost a record? This is the trickiest part of the solution, especially if you have millions of rows.
As for the architecture, you may want to check out the XMPP protocol - Smack for client (if Java) and eJabberD for Server.
Have a look at nServiceBus, Mass Transit or RhinoServiceBus if you're using .Net.
精彩评论