-

   rss_rss_hh_new

 - e-mail

 

 -

 LiveInternet.ru:
: 17.03.2011
:
:
: 51

:


Apache Spark . 2. Streaming,

, 16 2017 . 07:29 +
.
, , . Spark streaming Kafka. , .
, . Spark, , . .
Workflow : (Apache Kafka), Spark streaming. :

  • ( )
  • .


Kafka, HDFS , , HBase, . , Apache Flume, Spark streaming. Spark HDFS HBase, , exactly once . (Message Delivery Semantics).
:

  • At most once , .
  • At least once , .
  • Exactly once , . , .


. Spark exactly once , Spark. . , , .
workflow . , .
Kafa Spark, Spark (HDFS, HBase).

Kafa Spark


* Kafka Spark , .

, (Receiver-based Approach)


, Kafka consumer API (offsets). Zookeeper. , , , At least once.

, (Direct Approach (No Receivers))


, . HDFS, (checkpoints). exactly once , .

#


Spark checkpoints, , , . , , . . , , , . exactly once )) 1.6.0 Cloudera. , .

#


Kafka . , . , , . , . , , . , .

Spark


. , . exactly once , . , At most once .

:


Spark streaming , . , , Spark streaming , , , .
Original source: habrahabr.ru (comments, light).

https://habrahabr.ru/post/330986/

:  

: [1] []
 

:
: 

: ( )

:

  URL