Lambda architecture philosophy relies on a three layer principle:
- batch - traditionally what we know as batch files/tables pushed or pulled into a data repository
- speed - streaming data that can lack accuracy
- serving - the layer that presents the data to the end user
Batch is seen as the layer to "catch up" to streaming if analytics are relied on speed then eventually you will see a more accurate result come from batch for what you got from streaming output. That is the tradeoff and what Kappa challenges, for Kappa the philosophy is "streaming first" and is encapsulated in the philosophy behind Beam (Google's DataFlow is a managed Beam implementation).
To find out more about Lamba refer to: Big Data
Has streaming entered your Data Vault use cases at all?
Would you be looking at:
- non-historised structures in Data Vault modelling? And if so would you be hashing business keys if it adds unacceptable latency?
- Would you send the content straight to a Time-Series database and keep a copy of the data in the Data Vault for further analysis?
- What tech do you rely on to fulfil your Data Lake/Warehouse requirements?
@brucemccartney presented on streaming at wwdvc a few years ago, excellent content and lots of references to free online documentation on streaming SQL, windowing and late-arriving records and micro-batching!