Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
// internal/proto/milvus.proto
message QueryResults {
  ...
- repeated schema.FieldData fields_data = 2;
+ bytes record_batch = 2;
}

Design Details(required)

After this MEP, Milvus will not be compatible with previous Milvus 2.0.0-rcX, because:

  • the proto format between SDK and proxy changed
  • binlog file format changed

We divide this MEP into 2 stages, all compatibility changes will be achieved in Stage 1 before Milvus 2.0.0, other internal changes can be left later.

...

TaskCurrent BehaviorProposal
1

SDK (python/go/JS) send InsertRequest with fields_data

Update all SDK (python/go/JS) to write inserted data into Arrow record, then send InsertRequest with Arrow bytes

2

Proxy receive InsertRequest, if autoID field is empty, create a field with generated IDs, then insert this field into fields_data

Proxy receive InsertRequest, decode Arrow record from record_batch bytes, if autoID field is empty, re-create Arrow record filled with generated IDs

3change column-based data to row-based datadon't need any more
4

based on hash_key, save row_data into internal.InsertRequest, encapsulate into InsertMsg, then send to pulsar

based on hash_key, save row_data (get via Arrow array.NewSlice) into internal.InsertRequest, encapsulate into InsertMsg, then send to pulsar

5Datanode receive InsertMsg, convert row_data to column-based data again, then save to Minio with Parquet formatDatanode receive InsertMsg, assemble data slice into Arrow table, then save to Minio with Parquet format

Search Data Flow

TaskCurrent BehaviorProposal
1

segcore get SearchResult, reduce, then fill row_data_

re-organize SearchResult, save to MarshaledHits (C++)

MarshaledHits (C++) is serialized and copied in Go

segcore get SearchResult, reduce, then fill row_data_

re-organize SearchResult, save to Arrow format (C++)

Arrow format search result is serialized and copied in (Go)

2

MarshaledHits is decoded and converted to column-based data saved in schema.SearchResultData

Save serialized Arrow format search result into schema.SearchResultData
3SearchResultData is serialized, saved into internal.SearchResults, encapsulated into SearchResultMsg, then send to pulsarno change
4Proxy receive SearchResultMsg, decode SearchResultData, then reduceProxy receive SearchResultMsg, decode SearchResultData, then reduce Arrow format search result
5send SearchResultData with fields_data to SDKsend SearchResultData with Arrow bytes to SDK

Test Plan(required)

Pass all CI flows

...