I am running a Kafka controller and broker in a docker container on a physical server and I am trying to set retention period for the messages. My use-case requires the messages to be deleted regardless of the consumer having read them if they are older than 100ms as my application should not store stale data and it needs to be as "real time" as possible. My questions are:
What are the relevant settings that I need to configure to achieve what I want?
What are the performance implications if any?
I read in the Kafka documentation (/) that there are two settings that are relevant for my use-case - i) log.retention.ms & ii) log.retention.check.interval.ms
From the documentation:
i) log.retention.ms - "The number of milliseconds to keep a log file before deleting it (in milliseconds)"
ii) log.retention.check.interval.ms - "The frequency in milliseconds that the log cleaner checks whether any log is eligible for deletion"
I have the following settings so far in my compose.yaml:
services:
broker:
image: apache/kafka:latest
container_name: broker
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_LISTENERS: PLAINTEXT://localhost:9092,CONTROLLER://localhost:9093
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@localhost:9093
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_NUM_PARTITIONS: 5
KAFKA_DEFAULT_REPLICATION_FACTOR: 1
KAFKA_LOG_RETENTION_MS: 100
KAFKA_LOG_RETENTION_CHECK_INTERVAL_MS: 101
Do I need to set anything else?
How are segment deleted if the retention period is set to be so low? Do I need to set log.roll.ms? Does Kafka roll segments every 100ms? What are the performance implications of the above? Thanks!