...
Code Block |
---|
// internal/proto/milvus.proto message QueryResults { ... - repeated schema.FieldData fields_data = 2; + bytes record_batch = 2; } |
Design Details(required)
After this MEP, Milvus will not be compatible with previous Milvus 2.0.0-rcX, because:
- the proto format between SDK and proxy changed
- binlog file format changed
We divide this MEP into 2 stages, all compatibility changes will be achieved in Stage 1 before Milvus 2.0.0, other internal changes can be left later.
...
Task | Current Behavior | Proposal |
---|---|---|
1 | SDK (python/go/JS) send InsertRequest with fields_data | Update all SDK (python/go/JS) to write inserted data into Arrow record, then send InsertRequest with Arrow bytes |
2 | Proxy receive InsertRequest, if autoID field is empty, create a field with generated IDs, then insert this field into fields_data | Proxy receive InsertRequest, decode Arrow record from record_batch bytes, if autoID field is empty, re-create Arrow record filled with generated IDs |
3 | change column-based data to row-based data | don't need any more |
4 | based on hash_key, save row_data into internal.InsertRequest, encapsulate into InsertMsg, then send to pulsar | based on hash_key, save row_data (get via Arrow array.NewSlice) into internal.InsertRequest, encapsulate into InsertMsg, then send to pulsar |
5 | Datanode receive InsertMsg, convert row_data to column-based data again, then save to Minio with Parquet format | Datanode receive InsertMsg, assemble data slice into Arrow table, then save to Minio with Parquet format |
Search Data Flow
Task | Current Behavior | Proposal |
---|---|---|
1 | segcore get SearchResult, reduce, then fill row_data_ re-organize SearchResult, save to MarshaledHits (C++) MarshaledHits (C++) is serialized and copied in Go | segcore get SearchResult, reduce, then fill row_data_ re-organize SearchResult, save to Arrow format (C++) Arrow format search result is serialized and copied in (Go) |
2 | MarshaledHits is decoded and converted to column-based data saved in schema.SearchResultData | Save serialized Arrow format search result into schema.SearchResultData |
3 | SearchResultData is serialized, saved into internal.SearchResults, encapsulated into SearchResultMsg, then send to pulsar | no change |
4 | Proxy receive SearchResultMsg, decode SearchResultData, then reduce | Proxy receive SearchResultMsg, decode SearchResultData, then reduce Arrow format search result |
5 | send SearchResultData with fields_data to SDK | send SearchResultData with Arrow bytes to SDK |
Test Plan(required)
Pass all CI flows
...