Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

        5. the root coordinator will record a task list in Etcd, after the generated segments successfully build index, root coordinator mark marks the task as "completed"

Image RemovedImage Added

1.  SDK Interfaces

...

Code Block
service MilvusService {
  rpc Import(ImportRequest) returns (ImportResponse) {}
  rpc GetImportState(GetImportStateRequest) returns (GetImportStateResponse) {}
}

message ImportRequest {
  string collection_name = 1;                // target collection
  string partition_name = 2;                 // target partition
  bool row_based = 3;                        // the file is row-based or column-based
  repeated string files = 4;  //                // file paths to be imported
  repeated common.KeyValuePair options = 5;  // import options, bucket, etc.
}

message ImportResponse {
  common.Status status = 1;
  repeated int64 tasks = 2;  // id array of import tasks
}

message GetImportStateRequest {
  int64 task = 1;  // id of an import task
}

enum ImportState {
  Pending = 0;
  Failed = 1;
  Parsing = 2;
  Persisted = 3;
  Indexing = 4;
  Completed = 5;
}

message GetImportStateResponse {
  common.Status status = 1;
  ImportState state = 2;                   // is this import task finished or not
  int64 row_count = 3;  //  if the task is finished, this value is how many rows are       // if the task is finished, this value is how many rows are imported. if the task is not finished, this value is how many rows are parsed. return 0 if failed.
  repeated int64 id_list = 4;              // auto generated ids if the primary key is autoid
  repeated common.KeyValuePair infos = 5;  // more informations about the task, progress percent, file path, failed reason, etc.
}



3.

...

Rootcoord RPC interfaces

The declaration of import API in datacoord rootcoord RPC:

Code Block
service DataCoordRootCoord {
  rpc Import(milvus.ImportRequest) returns (milvus.ImportResponse) {}
  rpc GetImportState(milvus.GetImportStateRequest) returns (milvus.GetImportStateResponse) {}
  rpc CompleteImportReportImport(ImportResult) returns (common.Status) {}
}

message ImportResult {
  common.Status status = 1;
  repeated int64 segments = 2;    // id array of new sealed segments
  int64 row_count = 3; // how many rows are imported by this task        // id array of new sealed segments
  int64 row_count = 3;                     // how many rows are imported by this task
  repeated common.KeyValuePair infos = 4;  // more informations about the task, file path, failed reason, etc.
}



4. Datacoord RPC interfaces

The declaration of import API in datacoord RPC:

Code Block
service DataCoord {
  rpc Import(ImportTask) returns (common.Status) {}
}

message ImportTask {
  common.Status status = 1;
  string collection_name = 2;                // target collection
  string partition_name = 3;                 // target partition
  bool row_based = 4;                        // the file is row-based or column-based
  int64 task_id = 5;                         // id of the task
  repeated string files = 6;                 // file paths to be imported
  repeated common.KeyValuePair infos = 47;    // more informations about the task, file path, failed reasonbucket, etc.
}

...


5. Datanode interfaces

The declaration of import API in datanode RPC:

Code Block
service DataNode {
  rpc Import(milvus.ImportRequestImportTask) returns(common.Status) {}
}

...


6. Bulk Load task Assignment

There is a background knowledge that the inserted data shall be hashed into shards. Bulk load shall follow the same convention. There are two policy we can choose to satisfy this requirement:

...

Considering the efficiency and flexibility,  we shall implement option 1.

...


7. Result segments availability

By definition, the result segments shall be available altogether. Which means there shall be no intermediate state for loading.

To achieve this property, the segments shall be marked as "Loading" state and be invisible before the whole loading procedure completes.

...

8. Bulk Load with Delete

Constraint: The segments generated by Bulk Load shall not be affected by delete operations before the whole procedure is finished.

An extra attribute may be needed to mark the Load finish ts and all delta log before this ts shall be ignored.

...

9. Bulk Load and Query Cluster

After the load is done, the result segments needs to be loaded if the target collection/partition is loaded.

...

  1. Allow adding segments without removing one
  2. Bring target segments online atomically.

...

10. Bulk Load and Index Building

The current behavior of query cluster is that if there is an index built for the collection, the segments will not be loaded(as sealed segment) before the index is built.

...

Constraint: The bulk load procedure shall include the period of index building of the result segments

...


11. Bulk Load as a tool

The bulk load logic can be extracted into a tool to run outside of Milvus process. It shall be implemented in the next release.

...