Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
service DataNode {
  rpc Import(milvus.ImportRequest) returns(common.Status) {}
}


Bulk Load task Assignment

There is a background knowledge that the inserted data shall be hashed into shards. Bulk load shall follow the same convention. There are two policy we can choose to satisfy this requirement:

  1. Assign the task to single Datanode. This datanode shall follow the same hashing rule and put the records into corresponding segments
  2. Assign the task to Datanode(s),  which are responsible to watch the DmlChannels of the target Collection. Each datanode is responsible to handle its part of the data.

Considering the efficiency and flexibility,  we shall implement option 1.


Result segments availability

By definition,

Bulk Load with Delete

Constraint: The segments generated by Bulk Load shall not be affected by delete operations before the whole procedure is finished.

An extra attribute may be needed to mark the Load finish ts and all delta log before this ts shall be ignored.


Bulk Load and Query Cluster

After the load is done, the result segments needs to be loaded if the target collection/partition is loaded.

DataCoord has two ways to notify the Query Cluster there are some new segments to load

  1. Flush new segment
  2. Hand-off

Since we want to load the segments altogether, hand-off shall be a better candidate and some twist shall be taken:

  1. Allow adding segments without removing one
  2. Bring target segments online atomically.



Test Plan