Motivation

There are three ways to deploy Milvus for production use: standalone version, clustered version and cloud version.
In production, high availability (HA) is one of the most important things that people care about. In the cloud version, everything is fully managed and you do not need to worry about it. But there are different stories when it comes to local standalone deployment and clustered deployment.
We here introduce our current HA plan for Milvus standalone and clustered deployment. We will then focus on the HA problem for Milvus clustered deployment and introduce a master-slave architecture to achieve high availability.

For Standalone HA: Active/Standby

In standalone mode, we will use the rather straightforward HA solution, i.e. active-standby mode.
In addition to a normal Milvus standalone instance, or the "active" instance, we also start another identical instance, or the "standby" instance.
Here's how it works:


Clustered Milvus(Single Instance) HA: Coordinator Active/Standby

Milvus can be deployed as a single Milvus cluster. To reach high availability in this case, we'd introduce the Active/Standby mode to all Coordinators (i.e. RootCoord, DataCoord, QueryCoord, IndexCoord). Please refer to MEP 30 -- Support Coordinators Primary-backup Mechanism to see how it works.


Clustered Milvus(Multi Instances) HA: Leading/Alternative

The leading/alternative mode is similar to the active/standby mode, with the following differences:


Data exporting is highly required to fully accomplish the leading/alternative mode for clustered Milvus, where we need to support exporting in-real time all historical data, live data and DDL data. Once we have this, we will have alternative instance subscribe to the leading instance for all data.
Expecting more details to be enclosed.