Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
// internal/proto/milvus.proto
-message PlaceholderValue {
-  string tag = 1;
-  PlaceholderType type = 2;
-  // values is a 2d-array, every array contains a vector
-  repeated bytes values = 3;
-}

-message PlaceholderGroup {
-  repeated PlaceholderValue placeholders = 1;
-}

message SearchRequest {
  ...
- bytes placeholder_group = 6; // must
+ bytes vector_record = 6; // must
  ...
}

message Hits {
  ...
- repeated bytes row_data = 2;
+ bytes record_batch = 2;
  ...
}

// internal/proto/schema.proto
message SearchResultData {
  ...
- repeated FieldData fields_data = 3;
+ bytes record_batch = 3;
  ...
}

...

TaskCurrent BehaviorProposal
1

SDK (python/go/JS) send InsertRequest with repeated schema.FieldData fields_data = 5;

Update all SDK (python/go/JS) to write inserted data into Arrow record, then change Arrow record to bytes, then send InsertRequest with bytes record_batch = 5;Arrow bytes

2

Proxy receive InsertRequest, if  if autoID field is empty, create a field with generated IDs, then insert this field into fields_data

Proxy receive InsertRequest, decode Arrow record from record_batch bytes, if autoID field is empty, re-create Arrow record filled with generated IDs

3change column-based data to row-based datadon't need any more
4

based on hash_key, save row_data into internal.InsertRequest, encapsulate

it

into InsertMsg, then send to pulsar

based on hash_key, save row_data (get via Arrow array.NewSlice) into internal.InsertRequest, encapsulate

it

into InsertMsg, then send to pulsar

5Datanode receive InsertMsg, convert row_data to column-based data again, then save to Minio with Parquet formatDatanode receive InsertMsg

...

, assemble data slice into Arrow table, then save to Minio with Parquet format

Search Data Flow

TaskCurrent BehaviorProposal
1

segcore get SearchResult, reduce, then fill row_data_

re-organize SearchResult, save to MarshaledHits (C++)

MarshaledHits (C++) is serialized and copied in Go

segcore get SearchResult, reduce, then fill row_data_

re-organize SearchResult, save to Arrow format (C++)

Arrow format search result is serialized and copied in (Go)

2

MarshaledHits is decoded and converted to column-based data saved in schema.SearchResultData

Save serialized Arrow format search result into schema.SearchResultData
3SearchResultData is serialized, saved into internal.SearchResults, encapsulated into SearchResultMsg, then send to pulsarno change
4Proxy receive SearchResultMsg, decode SearchResultData, then reduceProxy receive SearchResultMsg, decode SearchResultData, then reduce Arrow format search result
5send SearchResultData with fields_data to SDKsend SearchResultData with Arrow bytes to SDK

Test Plan(required)

Pass all CI flows

...