1.What is Kafka?
Apache Kafka is a distributed messaging system and event streaming platform designed for high-throughput, fault-tolerant, real-time data pipelines and event-driven applications. It works on a publish-subscribe model, where producers send messages to topics, and consumers receive them. Kafka is widely used for real-time data streaming, supporting scalable and durable communication between systems.
2.Key Components in Kafka (Spring Boot)
| Producer | Sends messages to Kafka topics |
| Consumer | Listens to Kafka topics |
| Topic | Logical name for message stream |
| KafkaTemplate | Helper class to send messages |
| @KafkaListener | Annotation to consume messages |
| KafkaAdmin | Creates topics programmatically |
3.Dependencies (pom.xml)
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
4.Configuration (application.yml)
spring:
kafka:
bootstrap-servers: localhost:9092
consumer:
group-id: my-group
auto-offset-reset: earliest
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer
transaction-id-prefix: tx- # 🔑 Required for Kafka transactions
5. Kafka Topic Creation with KafkaAdmin
import org.apache.kafka.clients.admin.NewTopic;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.config.TopicBuilder;
@Configuration
public class KafkaTopicConfig {
@Bean
public NewTopic topic() {
return TopicBuilder.name("my-topic")
.partitions(3)
.replicas(1)
.build();
}
}
6.Kafka Producer
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Service;
@Service
public class KafkaProducerService {
private final KafkaTemplate<String, String> kafkaTemplate;
public KafkaProducerService(KafkaTemplate<String, String> kafkaTemplate) {
this.kafkaTemplate = kafkaTemplate;
}
public void sendMessage(String topic, String message) {
kafkaTemplate.send(topic, message);
}
}
7.Kafka Consumer
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Service;
@Service
public class KafkaConsumerService {
@KafkaListener(topics = "my-topic", groupId = "my-group")
public void consume(String message) {
System.out.println("Consumed message: " + message);
}
}
8. REST Controller to Test Producer
import org.springframework.web.bind.annotation.*;
@RestController
@RequestMapping("/api/kafka")
public class KafkaController {
private final KafkaProducerService producerService;
public KafkaController(KafkaProducerService producerService) {
this.producerService = producerService;
}
@GetMapping("/send")
public String sendMessage(@RequestParam String message) {
producerService.sendMessage("my-topic", message);
return "Message sent!";
}
}
9.Kafka Annotations in Detail
@KafkaListener | Listens to Kafka topic; binds method to topic |
@EnableKafka | Enables detection of @KafkaListener |
@KafkaHandler | Used with @KafkaListener on class level for multi-method listeners |
@KafkaListeners | Container for multiple @KafkaListener annotations on one method |
@SendTo | Sends reply to another topic from consumer |
@KafkaListenerContainerFactory | Creates custom listener container for advanced settings |
@Header | Access Kafka headers (optional metadata) |
10.Kafka Listener with Reply
@KafkaListener(topics = "request-topic", groupId = "my-group")
@SendTo("response-topic")
public String handleRequest(String message) {
return "Response to: " + message;
}
Tips
Use @EnableKafka in a config class if @KafkaListener doesn’t seem to work.
You can send JSON objects by customizing the serializers/deserializers.
Use DLQ (Dead Letter Queue) for failed message handling.
For transactions, set spring.kafka.producer.transaction-id-prefix=tx-
-----------------------------
11.Enable Kafka Transactions in Configuration
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.kafka.core.*;
import org.springframework.transaction.annotation.EnableTransactionManagement;
@Configuration
@EnableTransactionManagement
public class KafkaConfig {
@Bean
public KafkaTemplate<String, String> kafkaTemplate(ProducerFactory<String, String> producerFactory) {
KafkaTemplate<String, String> kafkaTemplate = new KafkaTemplate<>(producerFactory);
kafkaTemplate.setTransactionalIdPrefix("tx-");
return kafkaTemplate;
}
}
12. Using Kafka Transactions in Code
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
@Service
public class TransactionalKafkaService {
private final KafkaTemplate<String, String> kafkaTemplate;
public TransactionalKafkaService(KafkaTemplate<String, String> kafkaTemplate) {
this.kafkaTemplate = kafkaTemplate;
}
@Transactional
public void sendMessages() {
kafkaTemplate.executeInTransaction(t -> {
t.send("topic1", "message 1");
t.send("topic2", "message 2");
return true;
});
// OR just use this inside a @Transactional method
// kafkaTemplate.send("topic1", "message 1");
// kafkaTemplate.send("topic2", "message 2");
}
}
Notes:
a)If any send fails in a transaction, all others are rolled back.
b)The transaction-id-prefix must be unique per producer instance in a cluster.
c)Transactions require idempotent producer settings (handled automatically by Spring Boot when transaction-id-prefix is set).
-----------------------------------------------------------
2. Event-Driven Architecture with Kafka in Java (Spring Boot)
🔍 What is Event-Driven Architecture (EDA)?
Event-Driven Architecture (EDA) is a design paradigm where services communicate by producing and consuming events, rather than making direct API calls. It's highly asynchronous, decoupled, and scalable.
Instead of saying "do this", services say "this happened" — and any interested component can respond.
🧩 Core Concepts in Kafka-Based EDA
| Concept | Description |
|---|
| Event | A message representing a state change (e.g., "UserRegistered") |
| Producer | A service that sends (publishes) events to a Kafka topic |
| Consumer | A service that listens (subscribes) to events from a Kafka topic |
| Topic | A category/feed name to which records are sent |
| Broker | A Kafka server that stores and manages messages |
| Partition | A way to split topic data for scaling |
| Offset | Message ID used by consumers to track reading position |
🎯 Benefits of EDA with Kafka
| Benefit | Description
|
|---|
| Loose coupling | Services don't depend on each other |
| Scalability | Kafka handles millions of messages per second |
| Resilience | Failures in one service don’t crash the others |
| Asynchronous | Non-blocking operations |
| Auditability | Events are durable and replayable |
🔧 Example: User-Service ➡ Kafka ➡ Notification-Service
Scenario:
✅ Step-by-Step with Spring Boot + Kafka
1. Add Dependencies
In pom.xml:
2. Kafka Configuration (application.yml)
3. Producer Code (UserService)
Usage:
4. Consumer Code (NotificationService)
5. Kafka Topic Configuration (Optional)
🐳 Docker Compose for Kafka
✅ Summary
| Component | Role |
|---|
user-service | Publishes UserRegistered event to Kafka |
| Kafka Broker | Queues and distributes the event |
notification-service | Consumes the event and acts (e.g., sends email) |
------------------------------
3) Methods in Apache Kafka
Apache Kafka provides several core APIs with different sets of methods for producing, consuming, administering, and streaming data. Here's a detailed breakdown of important methods across the main Kafka classes:
🔑 1. KafkaProducer API (Producer Methods)
Used to send records to Kafka topics.
Common Methods:
| Method | Description |
|---|
send(ProducerRecord<K,V>) | Sends a record to a topic asynchronously. |
send(ProducerRecord<K,V>, Callback) | Sends a record with a callback to get acknowledgment. |
flush() | Forces all buffered records to be sent. |
close() | Closes the producer gracefully. |
Example:
📥 2. KafkaConsumer API (Consumer Methods)
Used to poll and read messages from Kafka topics.
Common Methods:
| Method | Description |
|---|
subscribe(Collection<String>) | Subscribe to a list of topics. |
poll(Duration) | Polls for messages from the broker. |
commitSync() | Commits the current offset synchronously. |
commitAsync() | Commits the offset asynchronously. |
seek(TopicPartition, long) | Seek to a specific offset in a partition. |
close() | Closes the consumer. |
assign(Collection<TopicPartition>) | Assign specific partitions manually. |
position(TopicPartition) | Returns the current offset position. |
seekToBeginning(Collection<TopicPartition>) | Seeks to beginning of the partition. |
seekToEnd(Collection<TopicPartition>) | Seeks to end of the partition. |
Example:
🛠️ 3. AdminClient API (Admin Operations)
Used to manage Kafka topics, brokers, etc.
Common Methods:
| Method | Description |
|---|
createTopics(Collection<NewTopic>) | Creates new topics. |
listTopics() | Lists all topics. |
describeTopics(Collection<String>) | Describes one or more topics. |
deleteTopics(Collection<String>) | Deletes topics. |
describeCluster() | Describes the Kafka cluster. |
Example:
🔄 4. Streams API (KafkaStreams)
Used for real-time stream processing.
Common Methods:
| Method | Description |
|---|
start() | Starts stream processing. |
close() | Stops stream processing. |
cleanUp() | Cleans local state stores. |
state() | Gets the current state of the KafkaStreams instance. |
📦 5. Other Utility/Support Methods
ProducerRecord Methods:
| Method | Description |
|---|
topic() | Gets the topic name. |
key() | Gets the message key. |
value() | Gets the message value. |
partition() | Gets the assigned partition (can be null). |
ConsumerRecord Methods:
| Method | Description |
|---|
topic() | Gets the topic name. |
key() | Gets the key. |
value() | Gets the value. |
offset() | Gets the offset. |
partition() | Gets the partition number. |
timestamp() | Gets the timestamp of the record. |
✅ Summary by Component
| Kafka Component | Key Classes | Important Methods |
|---|
| Producer | KafkaProducer, ProducerRecord | send(), flush(), close() |
| Consumer | KafkaConsumer, ConsumerRecord | poll(), subscribe(), commitSync() |
| Admin | AdminClient, NewTopic | createTopics(), listTopics() |
| Streams | KafkaStreams | start(), close(), state() |
4)poll() in kafka
In Apache Kafka, the poll() method is a key part of the Kafka Consumer API. It is used to fetch data from Kafka topics.
🔍 What is poll() in Kafka?
In Kafka, the consumer retrieves records from the broker using the poll() method. This is how the consumer gets new messages from the partitions it subscribes to.
📘 Syntax
-
consumer: An instance of KafkaConsumer.
-
poll(Duration): Tries to fetch data from Kafka for a given timeout (here, 100 milliseconds).
-
Returns a ConsumerRecords object, which is an iterable collection of ConsumerRecord.
🔁 Internals of poll()
-
Heartbeat: It also acts as a heartbeat to the Kafka broker to indicate the consumer is alive.
-
Rebalance Participation: It ensures the consumer participates in group rebalancing.
-
Offset Management: Kafka tracks the offset of consumed messages, and poll() plays a key role in this.
-
Fetching messages: Pulls batches of messages from the topic partitions assigned to the consumer.
🛑 Important Notes
-
Mandatory loop:
You must call poll() continuously in a loop, or Kafka considers the consumer as dead.
-
Timeout value:
-
Thread-safety:
-
Auto Offset Commit:
If enable.auto.commit=true, offsets are committed automatically after the poll.
🧠 Summary
| Aspect | Description |
|---|
| Purpose | Fetch records from Kafka |
| Return Type | ConsumerRecords<K, V> |
| Needs Loop | Yes |
| Heartbeat to Kafka | Yes |
| Timeout Param | Defines how long to wait for new data |
| Auto Offset Commit | Happens after poll() if enabled |
4) Kafka Consumer Group
A Consumer Group is a collection of one or more consumers that work together to consume messages from one or more Kafka topics.
🔍 Key Points:
-
Each consumer in a group reads from a unique partition of a topic (no duplication).
-
Kafka distributes partitions among consumers in the same group for parallel processing.
-
If the number of consumers > partitions, some consumers will be idle.
-
If the number of consumers < partitions, some consumers will read multiple partitions.
📌 Use Case:
🔄 How it Works
-
If Topic T1 has 4 partitions: P0, P1, P2, P3
-
Consumer Group G1 has 2 consumers: C1, C2
Then:
🎯 Benefits of Consumer Groups
| Feature | Description |
|---|
| Scalability | Consume from partitions in parallel |
| Fault Tolerance | If a consumer crashes, another takes over |
| Offset Tracking | Kafka stores offset per partition per group |
| Exactly-once | Each message delivered to one group member only |
5). Kafka Cluster
What: A Kafka Cluster is a distributed system consisting of multiple Kafka brokers. It is used for real-time stream processing, event sourcing, and asynchronous communication between microservices.
Key Components:
-
Broker: A Kafka server that stores and serves data
-
Zookeeper (legacy): Coordinates brokers (Kafka ≥ 2.8 can work without it)
-
Producer: Publishes messages
-
Consumer: Subscribes and reads messages
-
Topic: Logical channel to group messages
-
Partition: Splits a topic for parallelism and scalability
Cluster Characteristics:
-
High availability: Replication of partitions
-
Scalability: Add brokers to handle more load
-
Fault tolerance: If one broker fails, others take over
Example Cluster Setup:
-
3 Kafka brokers (running on different machines)
-
Topic "orders" with 3 partitions and replication factor 2
-
Zookeeper (optional from Kafka 3.0+)
Best Practices:
-
Monitor lag using Kafka UI or Burrow
-
Set retention policies to manage disk space
-
Use schema registry with Avro/Protobuf
6) group.id in Kafka
group.id is a mandatory configuration property used to assign a Kafka consumer to a specific consumer group.
🔍 Key Points:
-
Kafka uses group.id to manage offsets and partition assignment.
-
Consumers with the same group.id form one consumer group and share the topic partitions.
-
Kafka tracks offsets per group, so each group has independent consumption.
7) Saga Pattern
What: Saga is a microservice design pattern to maintain data consistency in distributed transactions without two-phase commit (2PC). Each service performs a local transaction and publishes an event or makes a compensating transaction if something fails.
Types of Sagas:
-
Choreography (Event-based):
-
Services listen to events and act accordingly
-
No centralized coordinator
-
Pros: Loose coupling
-
Cons: Hard to track, complex logic in services
-
Orchestration (Command-based):
-
A central orchestrator tells each service what to do
-
Services report back success/failure
-
Pros: Centralized logic
-
Cons: Single point of coordination
Example (Order Processing):
-
Step 1: Order Service → creates order
-
Step 2: Payment Service → reserves funds
-
Step 3: Inventory Service → reduces stock
-
If step 3 fails → Compensation → Refund payment → Cancel order
Tools/Libraries:
Use Case: Distributed e-commerce or travel booking systems where all-or-nothing is not possible.
8) Apache Kafka - Explanation of Each Component
1. Apache Kafka Architecture
Apache Kafka is a distributed messaging system and event streaming platform designed for high-throughput, fault-tolerant, real-time data pipelines and event-driven applications. It works on a publish-subscribe model, where producers send messages to topics, and consumers receive them. Kafka is widely used for real-time data streaming, supporting scalable and durable communication between systems.
Kafka ensures:
-
High throughput
-
Fault-tolerance
-
Horizontal scalability
2. Producers
-
Producers are clients or services that send data to Kafka topics.
-
Example: A Spring Boot service that sends an event like UserRegistered to a Kafka topic.
-
They serialize data and push it to Kafka asynchronously.
📌 Spring Boot Integration: You use KafkaTemplate<String, Object> to send messages.
3. Consumers
-
Consumers are services or apps that read data from Kafka topics.
-
They subscribe to a topic and process messages as they arrive.
-
Can be grouped into consumer groups for parallel processing and load balancing.
📌 Spring Boot Integration: You use @KafkaListener to consume messages from topics.
4. Topics
-
A topic is a logical channel or stream to which messages are published.
-
Producers write to a topic; consumers subscribe to and read from it.
-
Topics are append-only logs — messages are not deleted immediately after consumption.
Example:
5. Partitions
More partitions = better throughput, but requires careful planning.
6. Brokers
-
A Kafka broker is a Kafka server.
-
It stores data, handles read/write requests, and maintains partitions.
-
Brokers coordinate with ZooKeeper (or KRaft in newer versions) for leader election and metadata.
📌 Example: In a Kafka cluster of 3 brokers, partitions are distributed across brokers.
7. Clusters
📌 Example:
If broker 1 fails, broker 2 can take over as leader for some partitions.
8. Integration with Spring Boot
🔹 Producer Side:
🔹 Consumer Side:
✅ Summary Table:
| Component |
Purpose / Description |
| Producer |
Sends messages to Kafka topics. |
| Consumer |
Reads messages from Kafka topics, individually or as part of a consumer group. |
| Consumer Group |
A group of consumers that cooperatively consume messages from topics in parallel. |
| Topic |
Logical channel or category where messages are published and consumed. |
| Partition |
Subdivision of a topic to allow scalability and parallel processing. |
| Broker |
Kafka server that stores and manages message data and handles client communication. |
| Kafka Cluster |
A group of brokers that together manage the distributed messaging system. |
| Spring Boot Integration |
Enables applications to act as producer or consumer using Spring Kafka annotations and templates. |
| Spring Boot App |
The application that integrates with Kafka via Spring Boot; can send or receive messages. |
Kafka Architecture
The
Spring Boot Kafka Consumer is responsible for reading messages from Kafka and storing them into
MS SQL DB using service and repository layers.
Kafka Consumer (Spring Boot)
|
v
[ Service Layer / Repository Layer ]
|
v
MS SQL DB
Spring Boot → Repository Layer → MS SQL DB
No comments:
Post a Comment