Thursday 24 March 2016

Apache Kafka

Apache Kafka for Beginner's


Message queuing allows applications to communicate by sending messages to each other. The message queues provides a temporary message storage when the destination program is busy or not connected. Most of you know about messaging queue's. A normal messaging queue is not capable of handling big data, which is where a Distributed Messaging Queue comes to the rescue.

Big data needs a scalable messaging system, means it should easily scale to thousands of nodes. The system should be fault tolerant in such a way that it should work even if some nodes in a cluster goes down and should support replication. In short, there shouldn't be a single point failure. Also the messaging system should support higher throughput, means it should handle millions of messages in short time.

Here Apache Kafka fits in the world of distributed messaging. Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system, but with a unique design.

Features of Apache Kafka

  • No Single point of failure, Peer to Peer architecture and doesn't follow master-slave.
  • Higher throughput
  • Easily scale to thousands of nodes
  • Replication supports, in such a way that messages are replicated across a cluster
  • Durable, messages are persisted into file system
  • Open Source by LinkedIn to the Apache Community

Apache Kafka was conceptually designed to partition and persist large amounts of messages disregarding if the consumers are online or not. The main point is not to throttle down producers because consumers are failing to consume data fast enough but to provide a buffer between the flood of events and the system/consumers. 

AMQP(Advanced Message Queuing Protocol) standard defines that one of each producer, channel, exchange, queue and consumer are required for ordered delivery. This breaks the philosophy of no single point of failure.

When looking at the performance, Kafka can sustain 1 million of messages produced per second on just a couple of nodes keeping the durability and ordered partitioning of data. This performance is considered high and only the top few companies have higher requirements than this.

Being distributed, Kafka has fail-over mechanisms where if master node is down, one of the existing nodes is automatically voted and promoted into master.

If you need to push large messages or if simplicity and ease of use are what you are after you should consider some of the lightweight brokers, but if you need reliability and performance at scale and are pushing large amounts of data through your system then Kafka is the perfect choice.

Example:
Kafka Example

No comments:

Post a Comment