Publish/subscribe with large files as message payloads_问答_开发者

Publish/subscribe with large files as message payloads

开发者 https://www.devze.com 2023-01-13 20:59 出处：网络

We have an existing system that processes a lot of files on an ongoing basis. Roughly speaking, about 3 million files a day that can range in size from a few kilobytes to in excess of 50 MB. These files go through a few different stages of processing from the time they are received to when they are finished being consumed, depending on the path they take. Due to the content and format of these files, they can NOT be broken up into smaller chunks.

Currently, the workflow these files move through is rigid and dictated by the code with fixed inputs and outputs (in many cases, where one subscriber becomes the publisher for a new set of files). This lack of flexibility is starting to cause us issues however so I'm looking at some kind of pub/sub solution for being able to handle new requirements.

Most traditional pub/sub solutions have the data within the actual payload, but the large potential file sizes开发者_如何学JAVA exceed the limits of many messaging platforms. Furthermore, we have multiple platforms in play: files progress through both Linux and Windows tiers depending on their path.

Does anyone have any design and/or implementation recommendations with the following goals in mind?

1. Multiplatform for both pub and sub (Linux and Windows)

2. Persistent storage/store-and-forward support

3. Can handle large event payloads and appropriately cleans up once all subscribers have been serviced

4. Routing/workflow is done via configuration

5. Subscribers can subscribe to a filtered set of published events based on changing criteria (e.g. only give me files of a specific type)

I've done a bunch of digging into a number of service bus and MQ implementations, but haven't quite been able to firm up enough of a design approach to properly evaluate what tools make the most sense. Thanks for any input.

A1. I developed similar system on my previous job. We didn't pass the multi-MB payload inside the message, instead we stored it on the file server, and only passed the UNC file name (the messaging was Java RMI, but pretty much anything will work).

A2. I recently started to use Windows Communication Foundation. Fortunately for me, I'm only supporting Windows, and I don't need such big messages. However the documentation says the protocol is platform-independent, and there's the option to pass huge chunks of data using its streaming message transfer feature.

In both cases, I think you'll have to fulfill your #4 and #5 requirements in your own code.

You may want to look into ActiveMQ if your clients are internal clients. ActiveMQ does support up to 2GB of data (I think) and also support blob messages. It guarantees delivery and processing (with transactions).

Hope this helps.