开发者

What is the definition of realtime, near realtime and batch? Give examples of each?

开发者 https://www.devze.com 2023-02-16 07:07 出处:网络
I\'m trying to get a good definition of realtime, near realtime and batch?I am not talking about sync and async although to me, they are different dimensions.Here is what I\'m thinking

I'm trying to get a good definition of realtime, near realtime and batch? I am not talking about sync and async although to me, they are different dimensions. Here is what I'm thinking

  • Realtime is sync web services or async web services.
  • Near realtime could be JMS or messaging systems or most event driven systems.
  • Batch to me is more of an timed system that is processing when it wakes up.

Give examples of each and feel free to fix my assu开发者_StackOverflowmptions.


https://stackoverflow.com/tags/real-time/info

Real-Time

Real-time means that the time of an activity's completion is part of its functional correctness. For example, the sqrt() function's correctness is something like

The sqrt() function is implemented correctly if, for all x >=0, sqrt(x) = y implies y^2 == x.

In this setting, the time it takes to execute the sqrt() procedure is not part of its functional correctness. A faster algorithm may be better in some qualitative sense, but no more or less correct.

Suppose we have a mythical function called sqrtrt(), a real-time version of square root. Imagine, for instance, we need to compute the square root of velocity in order to properly execute the next brake application in an anti-lock braking system. In this setting, we might say instead:

The sqrtrt() function is implemented correctly if

  1. for all x >=0, sqrtrt(x) = y implies y^2 == x and
  2. sqrtrt() returns a result in <= 275 microseconds.

In this case, the time constraint is not merely a performance parameter. If sqrtrt() fails to complete in 275 microseconds, you may be late applying the brakes, triggering either a skid or reduced braking efficiency, possibly resulting in an accident. The time constraint is part of the functional correctness of the routine. Lift this up a few layers, and you get a real-time system as one (at least partially) composed of activities that have timeliness as part of their functional correctness conditions.

Near Real-Time

A near real-time system is one in which activities completion times, responsiveness, or perceived latency when measured against wall clock time are important aspects of system quality. The canonical example of this is a stock ticker system -- you want to get quotes reasonably quickly after the price changes. For most of us non-high-speed-traders, what this means is that the perceived delay between data being available and our seeing it is negligible.

The difference between "real-time" and "near real-time" is both a difference in precision and magnitude. Real-time systems have time constraints that range from microseconds to hours, but those time constraints tend to be fairly precise. Near-real-time usually implies a narrower range of magnitudes -- within human perception tolerances -- but typically aren't articulated precisely.

I would claim that near-real-time systems could be called real-time systems, but that their time constraints are merely probabilistic:

The stock price will be displayed to the user within 500ms of its change at the exchange, with probability p > 0.75.

Batch

Batch operations are those which are perceived to be large blocks of computing tasks with only macroscopic, human- or process-induced deadlines. The specific context of computation is typically not important, and a batch computation is usually a self-contained computational task. Real-time and near-real-time tasks are often strongly coupled to the physical world, and their time constraints emerge from demands from physical/real-world interactions. Batch operations, by contrast, could be computed at any time and at any place; their outputs are solely defined by the inputs provided when the batch is defined.

Original Post

I would say that real-time means that the time (rather than merely the correct output) to complete an operation is part of its correctness.

Near real-time is weasel words for wanting the same thing as real-time but not wanting to go to the discipline/effort/cost to guarantee it.

Batch is "near real-time" where you are even more tolerant of long response times.

Often these terms are used (badly, IMHO) to distinguish among human perceptions of latency/performance. People think real-time is real-fast, e.g., milliseconds or something. Near real-time is often seconds or milliseconds. Batch is a latency of seconds, minutes, hours, or even days. But I think those aren't particularly useful distinctions. If you care about timeliness, there are disciplines to help you get that.


I'm curious for feedback myself on this. Real-time and batch are well defined and covered by others (though be warned that they are terms-of-art with very specific technical meanings in some contexts). However, "near real-time" seems a lot fuzzier to me.

I favor (and have been using) "near real-time" to describe a signal-processing system which can 'keep up' on average, but lags sometimes. Think of a system processing events which only happen sporadically... Assuming it has sufficient buffering capacity and the time it takes to process an event is less than the average time between events, it can keep up.

In a signal processing context: - Real-time seems to imply a system where processing is guaranteed to complete with a specified (short) delay after the signal has been received. A minimal buffer is needed. - Near real-time (as I have been using it) means a system where the delay between receiving and completion of processing may get relatively large on occasion, but the system will not (except under pathological conditions) fall behind so far that the buffer gets filled up. - Batch implies post-processing to me. The incoming signal is just saved (maybe with a bit of real-time pre-processing) and then analyzed later.

This gives the nice framework of real-time and near real-time being systems where they can (in theory) run forever while new data is being acquired... processing happens in parallel with acquisition. Batch processing happens after all the data has been collected.

Anyway, I could be conflicting with some technical definitions I'm unaware of... and I assume someone here will gleefully correct me if needed.


There are issues with all of these answers in that the definitions are flawed. For instance, "batch" simply means that transactions are grouped and sent together. Real Time implies transactional, but may also have other implications. So when you combine batch in the same attribute as real time and near real time, clarity in purpose for that attribute is lost. The definition becomes less cohesive, less clear. This would make any application created with the data more fragile. I would guess that practitioners would be better off w/ a clearly modeled taxonomy such as:

  • Attribute1: Batched (grouped) or individual transactions.
  • Attribute2: Scheduled (time-driven), event-driven.
  • Attribute3: Speed per transaction. For batch that would be the average speed/transaction.
  • Attribute4: Protocol/Technology: SOAP, REST, combination, FTP, SFTP, etc. for data movement.
  • Attributex: Whatever.

Attribute4 is more related to something I am doing right now, so you could throw that out or expand the list for what you are trying to achieve. For each of these attribute values, there would likely be additional, specific attributes. But to bring the information together, we need to think about what is needed to make the collective data useful. For instance, what do we need to know between batched & transactional flows, to make them useful together. For instance, you may consider attributes for each to provide the ability to understand total throughput for a given time period. Seems funny how we may create conceptual, logical, and physical data models (hopefully) for our business clients, but we don't always apply that kind of thought to how we define terminology in our discussions.


Any system in which time at which output is produced is significant. This is usually because the input corresponding to some movement in the physical environment or world and the output has to relate to the same movement. The lag from input to output time must be sufficiently small for acceptable timelines.

0

精彩评论

暂无评论...
验证码 换一张
取 消