Current state: Accepted

ISSUE: https://github.com/milvus-io/milvus/issues/15604

PRs: 

Keywords: bulk load, import

Released: with Milvus 2.1 

Authors:  

Summary

Import data by a shortcut to get better performance compared with insert(). 


Motivation

Typically, it cost several hours to insert one billion entities with 128-dimensional vectors. We need a new interface to do bulk load for the following purposes:

  1. import data from json format files. (first stage)
  2. import data from numpy format files. (first stage)
  3. copy a collection from one Milvus 2.0 server to another. (second stage)
  4. import data from Milvus 1.x to Milvus 2.0 (third stage)
  5. parquet/faiss files (TBD)

Design Details

Some points to consider:

         A row-based example:

{
  "table": {
    "rows": [
      {"id": 1, "year": 2021, "vector": [1.0, 1.1, 1.2]},
      {"id": 2, "year": 2022, "vector": [2.0, 2.1, 2.2]},
      {"id": 3, "year": 2023, "vector": [3.0, 3.1, 3.2]}
    ]
  }
}

         A column-based example:

{
  "table": {
    "columns": [
      "id": [1, 2, 3],
      "year": [2021, 2022, 2023],
      "vector": [
        [1.0, 1.1, 1.2],
        [2.0, 2.1, 2.2],
        [3.0, 3.1, 3.2]
      ]
    ]
  }
}


SDK Interfaces



RPC Interfaces



Internal machinery



Test Plan