Kafka Topic Cleanup Configuration Guide

Cleanup Policy

Kafka Topic Cleanup Configuration Guide

Cleanup Policy

The Kafka cleanup policy defines how Kafka manages old messages within a topic, whether they are deleted after a certain time or compacted to retain only the most recent value for each key.

Delete Policy (Default)

The delete policy is the default retention strategy in Kafka. With this policy, Kafka automatically removes records based on either a time duration or a configured size threshold. This prevents topics from consuming unlimited storage space.

Key Configuration Options:

  • retention.ms: Specifies how long (in milliseconds) Kafka should retain messages. For example, setting it to 86400000 will keep messages for 24 hours.
  • retention.bytes: Defines the maximum size in bytes for a topic partition. When this size is reached, older messages are deleted to free up space. For example, 1073741824 equals 1GB.

Kafka automatically monitors these settings and deletes log segments that exceed the specified limits.

Note: In newer Kafka versions especially when using KRaft mode (ZooKeeper-less architecture) the setting log.retention.check.interval.ms, which was previously used to control how frequently Kafka checks for expired segments, is no longer available at the topic level. Kafka now manages this behavior internally, so there is usually no need to configure this property manually.

Compact Policy

The compact policy offers a different way to manage data. Instead of deleting data based on age or size, Kafka retains only the latest message for each unique key. This is especially useful when you need to maintain a complete view of the current state for each key such as user settings or configuration changes.

Key Configuration Options:

  • cleanup.policy=compact : Enables compaction for the topic. Kafka will keep only the most recent value for each key.
  • min.cleanable.dirty.ratio=0.5: Determines when compaction should start. A value of 0.5 means Kafka begins cleaning when 50% of the log contains outdated messages.

In this policy, the log cleaner service handles compaction automatically based on system activity and available resources.

Unlike the delete policy, retention time does not affect compaction. Kafka will retain the latest record per key regardless of how old it is.

In summary, newer versions of Kafka include enhancements that improve how retention policies work, especially when using the KRaft architecture. Kafka now handles certain internal timings — such as log deletion intervals automatically, meaning you no longer need to manage properties like log.retention.check.interval.ms manually.

To quickly get started with Kafka in a local development environment, you can use Docker. Below is a docker-compose.yml file configured to run Kafka in KRaft mode (ZooKeeper-less) using the Bitnami Kafka image:

services: 
  kafka: 
    image: docker.io/bitnami/kafka:3.9.0 
    container_name: kafka 
    ports: 
      - "9092:9092" 
    volumes: 
      - kafka_data:/bitnami 
    environment: 
      KAFKA_CFG_NODE_ID: 0 
      KAFKA_CFG_PROCESS_ROLES: controller,broker 
      KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 0@kafka:9093 
      KAFKA_CFG_LISTENERS: PLAINTEXT://:9092,CONTROLLER://:9093 
      KAFKA_CFG_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092 
      KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT 
      KAFKA_CFG_CONTROLLER_LISTENER_NAMES: CONTROLLER 
      KAFKA_CFG_INTER_BROKER_LISTENER_NAME: PLAINTEXT 
 
volumes: 
  kafka_data:
  1. Save the content above into a file named docker-compose.yml.

2. Open a terminal and navigate to the directory containing the file.

3. Run the following command to start the Kafka container:

docker compose up

This will pull the bitnami/kafka:3.9.0 image and start a container named kafka.

4. Once the container is running, you can access the Kafka CLI tools by entering the container:

docker exec -it kafka sh

5. Kafka’s CLI scripts are located in:

/opt/bitnami/kafka/bin

To navigate to this directory:

cd /opt/bitnami/kafka/bin

To create a topic:

kafka-topics.sh --create --topic my-topic --bootstrap-server localhost:9092 -- partitions 1 --replication-factor 1

Here are some commands to update topic configuration for retention using Kafka’s command line interface:

  • Set the maximum size:
kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic --alter --add-config retention.bytes=1073741824
  • Set the retention period:
kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic --alter --add-config retention.ms=86400000
  • Set the cleanup policy for a topic to compact:
kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic --alter --add-config cleanup.policy=compact
  • Set the minimum cleanable dirty ratio:
kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic --alter --add-config min.cleanable.dirty.ratio=0.5
  • To verify topic configurations:
kafka-configs.sh --bootstrap-server localhost:9092 --entity-type topics --entity-name my-topic --describe

References:

Topic Configs
Topic Configs