• • • • Flume Kafka Storm Demo • • • • Used for creating streaming data flow Distributed Reliable Support for many inbound ingest protocols Real-Time Streaming Offline/Batch processing Source Web/ File … Channel Sink HDFS/ NoSQL… • HTTP a1.sources.r1.type = http a1.sources.r1.port = 5140 a1.sources.r1.channels = c1 a1.sources.r1.handler = org.example.rest.RestHandler • Spool Directory a1.sources.src-1.type = spooldir a1.sources.src-1.channels = ch-1 a1.sources.src-1.spoolDir = /var/log/apache/flumeSpool a1.sources.src-1.fileHeader = true • Exec a1.sources.r1.type = exec a1.sources.r1.command = tail -F /var/log/secure • Memory – High Throughput, not reliable • JDBC – Durable, slower • File – Good throughput , supports recovery • HDFS a1.sinks.k1.type = hdfs a1.sinks.k1.channel = c1 a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S a1.sinks.k1.hdfs.filePrefix = events- • HIVE a1.sinks.k1.type = hive a1.sinks.k1.channel = c1 a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083 a1.sinks.k1.hive.database = logsdb a1.sinks.k1.hive.table = weblogs • Kafka a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink a1.sinks.k1.topic = mytopic a1.sinks.k1.brokerList = localhost:9092 • Can chain • Multiplex • Fan-in Fan-out # example.conf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 • bin/flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console Kafka • Design Considerations • • • • • Log aggregator Distributed Batch messages to reduce number of connections Offline/Periodic consumption Pull model • Uses Zookeeper for node and consumer status. • At-least-Once delivery (using offset you can get exactly-Once processing) • Built-in data loss auditing • A producer writes to a topic and consumer reads from a topic. • Topic is divided into ordered set of partitions. Each partition is consumed by one consumer at a time. Offset is maintained for each consumer per partition. • Partition count determines the maximum consumer parallelism. • Each partition can have multiple replicas. This provides failover. • A broker can host multiple partition but can be leader for only one partition. • The leader receives message and replicates to other servers. • Server.properties file • Host name, Port • Zookeepers • bin/zookeeper-server-start.sh config/zookeeper.properties • bin/kafka-server-start.sh config/server.properties Storm • • • • • • • • Topologies Streams Spouts Bolts Stream groupings Reliability Tasks Workers • Unbounded sequence of tuples • Tuple is a list of values • Generates Streams • Can be Reliable or Unreliable • Reliable spouts use ack() and fail(). Tuples can be replayed. • Used for filtering, functions, aggregations, joins, talking to databases, and more. • Complex processing is achieved by using multiple bolts. • Types of Bolt Interfaces: • IRichBolt: this is general interface for bolts. Manual ack needed. • IBasicBolt: this is a convenience interface for defining bolts that do filtering or simple functions. Auto ack. • Tells storm how to process tuples with available tasks • Shuffle grouping – Tuples are randomly sent to tasks • Fields grouping – Group processing by fields. Makes sure only one task processes a grouped field value. Shuffle AB Field AB “X” XYZ XYZ CAB CAB XYF XYF “A” • Nimbus – Master node • Zookeeper – Cluster Coordination • Supervisor – Worker Processes • Nimbus : Master node. There can be only one master node in a cluster. Reassigns tasks in case of worker node failure. • Zookeeper : Communication backbone in cluster. Maintains state to aid in failover/recovery. • Supervisor : Worker node. Governs worker processes. Worker Node Worker Process Zookeeper node Supervisor Nimbus Executor Executor Task Task Task Task Task
© Copyright 2026 Paperzz