流式系統(影印版)

流式系統(影印版)

《流式系統(影印版)》是2019年東南大學出版社出版的圖書,作者是Tyler Akidau、Slava Chernyak、Reuven Lax。

基本介紹

  • 中文名:流式系統(影印版)
  • 作者:Tyler Akidau、Slava Chernyak、Reuven Lax
  • 出版時間:2019年6月1日
  • 出版社:東南大學出版社
  • ISBN:9787564183677
內容簡介,圖書目錄,作者簡介,

內容簡介

在傳統的數據處理流程中,總是先收集數據,然後將數據放到DB中。當人們需要的時候通過DB對數據做query,得到答案或進行相關的處理。這樣看起來雖然非常合理,但是結果卻非常的緊湊,尤其是在一些實時搜尋套用環境中的某些具體問題,類似於MapReduce方式的離線處理並不能很好地解決問題。這就引出了一種新的數據計算結構---流計算方式。它可以很好地對大規模流動數據在不斷變化的運動過程中實時地進行分析,捕捉到可能有用的信息,並把結果傳送到下一計算節點。《流式系統(影印版)》講解流計算原理。

圖書目錄

Preface Or: What Are You Getting Yourself Into Here?
Part Ⅰ.The Beam Model
1.Streaming 101
Terminology: What Is Streaming?
On the Greatly Exaggerated Limitations of Streaming
Event Time Versus Processing Time
Data Processing Patterns
Bounded Data
Unbounded Data: Batch
Unbounded Data: Streaming
Summary
2.The What, Where, When, and How of Data Processing
Roadmap
Batch Foundations: What and Where
What: Transformations
Where: Windowing
Going Streaming: When and How
When: The Wonderful Thing About Triggers Is Triggers Are Wonderful Things!
When: Watermarks
When: Early/On-Time~Late Triggers FTWI
When: Allowed Lateness (i.e., Garbage Collection)
How: Accumulation
Summary
3.Watermarks
Definition
Source Watermark Creation
Perfect Watermark Creation
Heuristic Watermark Creation
Watermark Propagation
Understanding Watermark Propagation
Watermark Propagation and Output Timestamps
The Tricky Case of Overlapping Windows
Percentile Watermarks
Processing-Time Watermarks
Case Studies
Case Study: Watermarks in Google Cloud Dataflow
Case Study: Watermarks in Apache Flink
Case Study: Source Watermarks for Google Cloud Pub/Sub
Summary
4.Advanced Windowing
When/Where: Processing-Time Windows
Event-Time Windowing
Processing-Time Windowing via Triggers
Processing-Time Windowing via Ingress Time
Where: Session Windows
Where: Custom Windowing
Variations on Fixed Windows
Variations on Session Windows
One Size Does Not Fit All
Summary
5.Exactly-Once and Side Effects
Why Exactly Once Matters
Accuracy Versus Completeness
Side Effects
Problem Definition
Ensuring Exactly Once in Shuffle
Addressing Determinism
Performance
Graph Optimization
Bloom Filters
Garbage Collection
Exactly Once in Sources
Exactly Once in Sinks
Use Cases
Example Source: Cloud Pub/Sub
Example Sink: Files
Example Sink: Google BigQuery
Other Systems
Apache Spark Streaming
Apache Flink
Summary
Part Ⅱ.Streams and Tables
6.Streams and Tables
Stream-and-Table Basics Or: a Special Theory of Stream and Table Relativity
Toward a General Theory of Stream and Table Relativity
Batch Processing Versus Streams and Tables
A Streams and Tables Analysis of MapReduce
Reconciling with Batch Processing
What, Where, When, and How in a Streams and Tables World
What: Transformations
Where: Windowing
When: Triggers
How: Accumulation
A Holistic View Of Streams and Tables in the Beam Model
A General Theory of Stream and Table Relativity
Summary
7.The Practicalities of Persistent State
Motivation
The Inevitability of Failure
Correctness and Efficiency
Implicit State
Raw Grouping
Incremental Combining
Generalized State
Case Study: Conversion Attribution
Conversion Attribution with Apache Beam
Summary
8.Streaming SQL
What Is Streaming SQL?
Relational Algebra
Time-Varying Relations
Streams and Tables
Looking Backward: Stream and Table Biases
The Beam Model: A Stream-Biased Approach
The SQL Model: A Table-Biased Approach
Looking Forward: Toward Robust Streaming SQL
Stream and Table Selection
Temporal Operators
Summary
9.Streaming Joins
All Your loins Are Belong to Streaming
Unwindowed loins
FULL OUTER
LEFT OUTER
RIGHT OUTER
INNER
ANTI
SEMI
Windowed loins
Fixed Windows
Temporal Validity
Summary
10.The Evolution of Large-Scale Data Processing
MapReduce
Hadoop
Flume
Storm
Spark
MillWheel
Kafka
Cloud Dataflow
Flink
Beam
Summary
Index

作者簡介

Tyler Akidau是Google的高級軟體工程師,擔任著Data Processing Languages & Systems小組技術負責人的職務。他也是Apache Beam PMC的創始成員。
Slava Chernyak是Google的高級軟體工程師。他花了六年時間研究Google內部的大規模流式數據處理系統。
Reuven Lax是Google的高級軟體工程師,在過去十年間一直在幫助制定Google的數據處理和分析策略,同時他也是Apache Beam PMC的成員。

相關詞條

熱門詞條

聯絡我們