.
,
, . Spark streaming Kafka. , .
, . Spark, , . .
Workflow : (Apache Kafka), Spark streaming. :
Kafka, HDFS , , HBase, . , Apache Flume, Spark streaming. Spark HDFS HBase, , exactly once . (Message Delivery Semantics).
:
- At most once , .
- At least once , .
- Exactly once , . , .
. Spark exactly once , Spark. . , , .
workflow . , .
Kafa Spark, Spark (HDFS, HBase).
Kafa Spark
* Kafka Spark , .
, (Receiver-based Approach)
, Kafka consumer API (offsets). Zookeeper. , , , At least once.
, (Direct Approach (No Receivers))
, . HDFS, (checkpoints). exactly once , .
#
Spark checkpoints, , , . , , . . , , , . exactly once )) 1.6.0 Cloudera. , .
#
Kafka . , . , , . , . , , . , .
Spark
. , . exactly once , . , At most once .
:
Spark streaming , . , , Spark streaming , , , .
https://habrahabr.ru/post/330986/