...
- Update Storage module to use GoArrow to write Parquet from Arrow, or read Arrow from Parquet directly, remove C++ Arrow.
- Remove all internal row-based data structure, including "RowData" in internalpb.InsertRequest, "row_data" in milvuspb.Hits, "row_data_" in C++ SearchResult.
- Optimize search result flow
Insert Data Flow
Task | Current Behavior | Proposal |
---|---|---|
1 | SDK (python/go/JS) send InsertRequest with repeated schema.FieldData fields_data = 5; | Update all SDK (python/go/JS) to write inserted data into Arrow record, then change Arrow record to bytes, then send InsertRequest with bytes record_batch = 5; |
2 | Proxy receive InsertRequest, if autoID field is empty, create a field with generated IDs, then insert this field into fields_data | Proxy receive InsertRequest, decode Arrow record from record_batch bytes, if autoID field is empty, re-create Arrow record filled with generated IDs |
3 | change column-based data to row-based data | don't need any more |
4 | based on hash_key, save row_data into internal.InsertRequest, encapsulate it into InsertMsg, then send to pulsar |
based on hash_key, |
save row |
_data (get via Arrow array.NewSlice) into internal.InsertRequest, encapsulate it into InsertMsg, then send to pulsar | ||
5 | Datanode receive InsertMsg, |
Search Data Flow
Test Plan(required)
...