...
Code Block |
---|
// internal/proto/milvus.proto -message PlaceholderValue { - string tag = 1; - PlaceholderType type = 2; - // values is a 2d-array, every array contains a vector - repeated bytes values = 3; -} -message PlaceholderGroup { - repeated PlaceholderValue placeholders = 1; -} message SearchRequest { ... - bytes placeholder_group = 6; // must + bytes vector_record = 6; // must ... } message Hits { ... - repeated bytes row_data = 2; + bytes record_batch = 2; ... } // internal/proto/schema.proto message SearchResultData { ... - repeated FieldData fields_data = 3; + bytes record_batch = 3; ... } |
...
Task | Current Behavior | Proposal |
---|---|---|
1 | SDK (python/go/JS) send InsertRequest with repeated schema.FieldData fields_data = 5; | Update all SDK (python/go/JS) to write inserted data into Arrow record, then change Arrow record to bytes, then send InsertRequest with bytes record_batch = 5;Arrow bytes |
2 | Proxy receive InsertRequest, if if autoID field is empty, create a field with generated IDs, then insert this field into fields_data | Proxy receive InsertRequest, decode Arrow record from record_batch bytes, if autoID field is empty, re-create Arrow record filled with generated IDs |
3 | change column-based data to row-based data | don't need any more |
4 | based on hash_key, save row_data into internal.InsertRequest, encapsulate itinto InsertMsg, then send to pulsar | based on hash_key, save row_data (get via Arrow array.NewSlice) into internal.InsertRequest, encapsulate itinto InsertMsg, then send to pulsar |
5 | Datanode receive InsertMsg, convert row_data to column-based data again, then save to Minio with Parquet format | Datanode receive InsertMsg, |
...
, assemble data slice into Arrow table, then save to Minio with Parquet format |
Search Data Flow
Task | Current Behavior | Proposal |
---|---|---|
1 | segcore get SearchResult, reduce, then fill row_data_ re-organize SearchResult, save to MarshaledHits (C++) MarshaledHits (C++) is serialized and copied in Go | segcore get SearchResult, reduce, then fill row_data_ re-organize SearchResult, save to Arrow format (C++) Arrow format search result is serialized and copied in (Go) |
2 | MarshaledHits is decoded and converted to column-based data saved in schema.SearchResultData | Save serialized Arrow format search result into schema.SearchResultData |
3 | SearchResultData is serialized, saved into internal.SearchResults, encapsulated into SearchResultMsg, then send to pulsar | no change |
4 | Proxy receive SearchResultMsg, decode SearchResultData, then reduce | Proxy receive SearchResultMsg, decode SearchResultData, then reduce Arrow format search result |
5 | send SearchResultData with fields_data to SDK | send SearchResultData with Arrow bytes to SDK |
Test Plan(required)
Pass all CI flows
...