Defining Types of Data Processing
-
The types of data processing are characterized by their latency:
Batch processing:
Daily or longerMini-Batch processing:
Hourly or longerMicro-Batch processing:
minutes intervals or moreReal-Time processing:
Sub-second intervals
-
They are also characterized by how data is loaded:
Batch:
Loaded incrementally in an off-peak windowMini-Batch:
Loaded incrementally in intra-day loadsMicro-Batch:
Loaded in intervalsReal-Time:
Loaded in sub-second intervals
-
They are also characterized by how data is captured:
Batch:
Calling queries that use filteringMini-Batch:
Calling queries that use filteringMico-Batch:
Using a Change Data Capture (CDC)Real-Time:
Using a Change Data Capture (CDC)
-
Capturing data from different sources is either performed:
- Through queries that filter based on a timestamp or flag
- Through a Change Data Capture mechanism that detects any changes as it is happening
Summarizing the Types of Data Processing
Type | Latency | Capture | Initialization | Target Load | Source Load |
---|---|---|---|---|---|
Batch | Daily | Filter query | Pull | High impact | High impact |
Mini-Batch | Hourly | Filter query | Pull | Low impact | Queries at peak |
Micro-Batch | Minutes | CDC | Push, then pull | Low impact | Some to none |
Real-Time | Seconds | CDC | Push | Low impact | Some to none |
Describing Reasons for Real-Time Processing
- Consumer expectations for faster response times
- Ubiquity of cost effective resources
- Widespread adoption
Lemonade Stand Analogy for Batch Processing
- John sells lemonade each day
- His brother queries John about his sales at the end of each day
-
Then, John does the following:
- Reviews the receipts for the day
- Calculates the sales for the day
- Then, his brother records the daily sales in his notebook
Lemonade Stand Analogy for Mini-Batch Processing
- John sells lemonade each day
- His brother queries John about his sales every hour
-
Then, John does the following:
- Reviews the receipts for that hour
- Calculates the sales for that hour
- Then, his brother records the hourly sales in his notebook
Lemonade Stand Analogy for Real-Time Processing
- John sells lemonade each day
-
His brother agrees to work outside with John
- His brother working outside represents a CDC approach
- Now, his brother handles administrative duties
- As a result, John is freed from the administrative burden
- John hands his brother a receipt after each sale
-
Then, his brother does the following:
- Calculates the sales after each sale
- Records the sales after each sale
References
Next