Current state: Accepted
ISSUE: https://github.com/milvus-io/milvus/issues/15604
PRs:
Keywords: bulk load, import
Released: with Milvus 2.1
Authors:
Import data by a shortcut to get better performance compared with insert().
Typically, it cost several hours to insert one billion entities with 128-dimensional vectors. We need a new interface to do bulk load for the following purposes:
Some points to consider:
A row-based example:
{ "table": { "rows": [ {"id": 1, "year": 2021, "vector": [1.0, 1.1, 1.2]}, {"id": 2, "year": 2022, "vector": [2.0, 2.1, 2.2]}, {"id": 3, "year": 2023, "vector": [3.0, 3.1, 3.2]} ] } } |
A column-based example:
{ "table": { "columns": [ "id": [1, 2, 3], "year": [2021, 2022, 2023], "vector": [ [1.0, 1.1, 1.2], [2.0, 2.1, 2.2], [3.0, 3.1, 3.2] ] ] } } |
Based on the several points, we choose a JSON object as a parameter of python import() API, the API declaration will be like this:
def import(options)
The "options" is a JSON object which has the following format:
{ "data_source": { // required "type": "Minio", // required "address": "localhost:9000", // optional, milvus server will use its minio setting if without this value "accesskey_id": "minioadmin", // optional, milvus server will use its minio setting if without this value "accesskey_secret": "minioadmin", // optional, milvus server will use its minio setting if without this value "use_ssl": false, // optional, milvus server will use its minio setting if without this value "bucket_name": "aaa" // optional, milvus server will use its minio setting if without this value }, "internal_data": { // optional, external_data or internal_data. (external files include json, npy, etc. internal files are exported by milvus) "path": "xxx/xxx/xx", // relative path to the source storage where store the exported data "collections_mapping": { // optional, give a new name to collection during importing. "aaa": "bbb", "ccc": "ddd" } }, "external_data": { // optional, external_data or internal_data. (external files include json, npy, etc. internal files are exported by milvus) "target_collection": "xxx", // target collection name "files": [ // required { "file": xxxx / xx.json, // required, relative path under the storage source defined by DataSource, currently support json/npy "type": "row_based", // required, row_based or column_based "fields_mapping": { // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty. "table.rows.id": "uid", "table.rows.year": "year", "table.rows.vector": "vector", } }, { "file": xxxx / xx.json, // required, relative path under the storage source defined by DataSource, currently support json/npy "type": "column_based", // required, row_based or column_based "fields_mapping": { // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty. "table.columns.id": "uid", "table.columns.year": "year", "table.columns.vector": "vector", } }, { "file": xxxx / xx.npy, // required, relative path under the storage source defined by DataSource, currently support json/npy "type": "column_based", // required, row_based or column_based "fields_mapping": { // optional, specify the target fields which should be imported. Milvus will import all fields if this list is empty. "vector": "vector", } } ], "default_fields": { // optional, use default value to fill some fields "age": 0, "weight": 0.0 }, } } |