Current state: Accepted

ISSUE: https://github.com/milvus-io/milvus/issues/5218

PRs: 

Keywords: Kafka

Released: with Milvus 2.1 

Authors:  


Motivation

The log broker is a pub-sub system within Milvus, It is responsible for streaming data persistence,  event notification, recovery etc.  Now Milvus cluster mode uses Pulsar as a log broker, and standalone mode uses RocksDB. 

Apache Kafka is a distributed event store and stream-processing platform, and it is a popular solution for data streaming needs.  Many community users expect Milvus to support Kafka because they have already used it in the production environment.

Image Source

Summary

 Milvus supports Kafka as a message stream,  we can use the configuration option to decide to use Pulsar or Kafka on cluster mode. We provide the function KafkaEnable() to use Kafka. If you don't want to use kafka, you need to comment out the configuration. Same for Pulsar and Rocksmq. If the configuration of pulsar, kafka and rocksmq are readable. then use rocksmq in standalone mode and pulsar in cluster.

plusar:
  address: localhost
  port: 6650
  maxMessageSize: 5242880
#kafka:
#  brokers:
#	- localhost:9092
#  port: 9092

Design Details

Configuration

version: '3.5'

services:
  zookeeper:
    image: 'bitnami/zookeeper:3.6.3'
    ports:
      - '2181:2181'
    environment:
      - ALLOW_ANONYMOUS_LOGIN=yes
  kafka:
    image: 'bitnami/kafka:3.1.0'
    ports:
      - '9092:9092'
    environment:
      - KAFKA_BROKER_ID=0
      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092
      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://127.0.0.1:9092
      - KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
      - ALLOW_PLAINTEXT_LISTENER=yes
      - KAFKA_CFG_MAX_PARTITION_FETCH_BYTES=5242880
      - KAFKA_CFG_MAX_REQUEST_SIZE=5242880
      - KAFKA_CFG_MESSAGE_MAX_BYTES=5242880
      - KAFKA_CFG_REPLICA_FETCH_MAX_BYTES=5242880
      - KAFKA_CFG_FETCH_MESSAGE_MAX_BYTES=5242880
    depends_on:
      - zookeeper

networks:
  default:
    name: milvus_dev


Kafka Client SDK

We tried using sarama and confluent-kafka-go in our development and found that there was basically no difference in the producer. But there is a big difference when using consumer group. 

Sarama use consumer group need to implement Sarama interface. It make very diffcult to control  and hard to seek.

confulent-kafka-go use consumer group to consume messages just a function. It is very simple to use. This function allows you to directly set the offset from which to start consumption.

Interface Implementation


Deployments


Test Plan