Kafka
Resource Plane
Messaging
Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant data pipelines and real-time streaming applications. It enables reliable, scalable data streaming with durable storage and stream processing capabilities.
Kafka

Messaging

Resource Plane

Apache Kafka is a distributed event streaming platform designed for high-throughput, fault-tolerant data pipelines and real-time streaming applications. It enables reliable, scalable data streaming with durable storage and stream processing capabilities.

What is Kafka?

Apache Kafka is a distributed event streaming platform designed to handle high-throughput, fault-tolerant data pipelines and real-time streaming applications. It implements a publish-subscribe messaging model with durable storage and stream processing capabilities.

Profile

Apache Kafka is a distributed event streaming platform designed to handle high-throughput, fault-tolerant data pipelines and real-time streaming applications. The platform implements a publish-subscribe messaging model enhanced with durable storage and stream processing capabilities. As a mature Apache Software Foundation project, Kafka provides proven scalability from small deployments to clusters handling trillions of messages across thousands of brokers. Its fundamental value proposition lies in enabling reliable, high-performance data streaming while maintaining data ordering guarantees and fault tolerance through distributed replication.

Focus

Kafka solves core distributed systems challenges around reliable message delivery, data integration, and stream processing. The platform addresses the limitations of traditional messaging systems by providing durable storage, horizontal scalability, and exactly-once processing semantics. It enables organizations to decouple data producers from consumers, handle high-volume event streams without data loss, and build real-time data pipelines that can scale elastically. Target users include platform engineers building data infrastructure, developers creating event-driven applications, and architects designing distributed systems that require guaranteed message ordering and delivery.

Background

Originally developed at LinkedIn to handle high-volume activity tracking, Kafka was open-sourced and donated to the Apache Software Foundation in 2011, graduating as a top-level project in 2012. The platform's architecture proved transformative for companies like Netflix, which uses it to process millions of events daily for real-time analytics, and LinkedIn, which operates thousands of brokers handling trillions of messages. Governed by the Apache Software Foundation through a Project Management Committee, Kafka maintains active development with contributions from a diverse community while remaining vendor-neutral under the Apache License 2.0.

Main features

Distributed message broker with durable storage

The platform implements a distributed commit log architecture where messages are stored in topics partitioned across multiple brokers. Each partition maintains an ordered, immutable sequence of records that can be retained based on time or size policies. This design enables horizontal scalability while preserving message ordering within partitions. The replication mechanism maintains multiple copies of data across different brokers, providing fault tolerance without sacrificing performance. The system can handle massive scale, from small deployments to clusters processing petabytes of data with guaranteed durability.

Scalable stream processing framework

Kafka Streams provides a client library for building distributed stream processing applications directly integrated with Kafka's storage and messaging capabilities. The framework enables stateful operations like aggregations and joins while maintaining exactly-once processing guarantees through careful coordination of processing state and message offsets. Applications can scale horizontally by adding more instances, with the framework automatically rebalancing workload across available processors. This architecture eliminates the need for separate processing clusters while supporting sophisticated real-time analytics and event-driven applications.

Enterprise-grade integration framework

Kafka Connect offers a standardized framework for building scalable, reliable data pipelines between Kafka and external systems. The framework provides built-in distributed scaling, offset management, and fault tolerance capabilities while supporting both source connectors that import data and sink connectors that export data. Connect workers can run in standalone mode for simple integrations or distributed mode for production deployments, with automatic work rebalancing across available nodes. The architecture enables organizations to implement complex data integration patterns without custom code while maintaining operational reliability.