Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

        2. proxy node passes the file paths to data coordinator noderoot coordinator, then root coordinator passes to data coordinator

        3. data coordinator node picks a data node or multiple data nodes (according to the sharding number) to parse files, each file can be parsed into a segment or multiple segments.

        4. once a task is finished, data node report to data coordinator, and data coordinator report to root coordinator, the generated segments will be sent to index node to build index

        5. the root coordinator will record a task list in Etcd, after the generated segments successfully build index, root coordinator mark the task as "completed"

1.  SDK Interfaces

The python API declaration:

...

  • task_id: id of an import task returned by import()
  • return {finishedstate: boolstring, row_count: integer, progress: float, failed_reason: string, id_list: list, file: string}

            Note: the "state" could be "pending", "parsing", "persisted", "indexing", "completed", "failed"


Pre-defined format for import files

...

Code Block
service MilvusService {
  rpc Import(ImportRequest) returns (ImportResponse) {}
  rpc GetImportState(GetImportStateRequest) returns (GetImportStateResponse) {}
}

message ImportRequest {
  string collection_name = 1;  // target collection
  string partition_name = 2;  // target partition
  bool row_based = 3;  // the file is row-based or column-based
  repeated string files = 4;  // file paths to be imported
  repeated common.KeyValuePair options = 5;  // import options
}

message ImportResponse {
  common.Status status = 1;
  repeated int64 tasks = 2;  // id array of import tasks
}

message GetImportStateRequest {
  int64 task = 1;  // id of an import task
}

enum ImportState {
  Pending = 0;
  ExecutingFailed = 1;
  CompletedParsing = 2;
  FailedPersisted = 3;
  Indexing = 4;
  Completed = 5;
}

message GetImportStateResponse {
  common.Status status = 1;
  ImportState state = 2;  // is this import task finished or not
  int64 row_count = 3;  // if the task is finished, this value is how many rows are imported. if the task is not finished, this value is how many rows are parsed. return 0 if failed.
  repeated int64 id_list = 4; // auto generated ids if the primary key is autoid
  repeated common.KeyValuePair infos = 5;  // more informations about the task, progress percent, file path, failed reason, etc.
}


3. Datacoord RPC interfaces

...

Code Block
service DataCoord {
  rpc Import(milvus.ImportRequest) returns (milvus.ImportResponse) {}
  rpc GetImportState(milvus.GetImportStateRequest) returns (milvus.GetImportStateResponse) {}
  rpc CompleteImport(ImportResult) returns (common.Status) {}
}

message ImportResult {
  common.Status status = 1;
  repeated int64 segments = 2;    // id array of new sealed segments
  int64 row_count = 3; // how many rows are imported by this task
  repeated common.KeyValuePair infos = 4;  // more informations about the task, file path, failed reason, etc.
}


4. Datanode interfaces

The declaration of import API in datanode RPC:

...