Choosing the Replication Factor and Partition Count

In Kafka, partition and replication factor are key concepts related to how data is distributed and stored within a Kafka cluster. These two are very important to set correctly as they impact the performance and durability in the system. Changing them later will have an adverse impact on the system as depicted below.
  • If the he partitions count increases during a topic lifecycle, you will break your keys ordering guarantees

  • If the replication factor increases during a topic lifecycle, you put more pressure on your cluster, which can lead to an unexpected performance decrease.

Choose the number of partitions in a Kafka topic

The number of partitions impacts parallelism, performance, and scalability. Consider the following factors:
  • More partitions allows more consumers to process data in parallel. If you have more consumers than partitions, some consumers will be idle.

  • More partitions can help producers send data more quickly because they can split the load across multiple partitions.

  • If you are sending messages to partitions based on keys, adding partitions later can be very challenging, so calculate throughput based on your expected future usage.

  • More partitions can help producers send data more quickly because they can split the load across multiple partitions.

  • If you have too many partitions, it might put excessive load on your brokers, especially if the cluster is small.

  • Each partition consumes resources (CPU, memory, disk), so having too many partitions can lead to inefficiency.

  • More partitions will have a partition leader to be elected by Zookeeper. Hence there will be more load on Zookeeper which will increase the time for leader elections.

Choose Replication factor in a kafka topic

The replication factor determines fault tolerance and availability. Consider the following:
  • The replication factor should be at least 2 to tolerate a single broker failure. With a replication factor of 3, you can tolerate the failure of one broker and still have another replica available. Never set it to 1 in production, as it means no fault tolerance. If the partition is lost, you will have data loss.

  • A replication factor of 3 is common in production environments because it provides a good balance between fault tolerance and resource usage. It ensures that even if one broker fails, there is still a backup, and the cluster can tolerate another failure during recovery.

  • A higher replication factor increases disk usage and network traffic as more copies of the data are maintained. Consider the trade-offs between fault tolerance and resource utilization.

  • If the data is critical and loss is unacceptable, opt for a higher replication factor, like 3 or even 5, depending on your cluster size and importance of the data.

  • If there is a performance issue due to a higher replication factor, you should get a better broker instead of lowering the replication factor.

Ideal values

  • If you have a small cluster of fewer than 6 brokers, create three times, i.e., 3X, the number of brokers you have. The reasoning behind it is that if you have more brokers over time, you will have enough partitions to cover that.

  • If you have a big cluster of over 12 brokers, create two times i.e., 2X, the number of brokers you have.

  • You also want to take into account the number of consumers you need to run in a group at the desired peak throughput. If, for example, you need 20 consumers at peak time, you need at least 20 partitions in your topic, regardless of the size of your cluster.

Example Scenario

  • Small Cluster (3 Brokers):
    • Partitions: Start with 5–6 partitions per topic.

    • Replication Factor: Set to 3 for fault tolerance.

  • Large Cluster (10+ Brokers):
    • Partitions: Start with 20–30 partitions per topic, adjusting based on throughput needs.

    • Replication Factor: Keep it at 3 for balanced fault tolerance.

Monitoring and adjusting based on your application’s needs and the cluster’s performance is essential. Kafka allows you to add partitions to a topic later if necessary, but changing the replication factor requires more careful planning.