Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Update Storage module to use GoArrow to write Parquet from Arrow, or read Arrow from Parquet directly, remove C++ Arrow.
  2. Remove all internal row-based data structure, including "RowData" in internalpb.InsertRequest, "row_data" in milvuspb.Hits, "row_data_" in C++ SearchResult.
  3. Optimize search result flow


Insert Data Flow

4
TaskCurrent BehaviorProposal
1

SDK (python/go/JS) send InsertRequest with

repeated schema.FieldData fields_data = 5;

Update all SDK (python/go/JS) to write inserted data into Arrow record, then change Arrow record to bytes,

then send InsertRequest with

bytes record_batch = 5;

2

Proxy receive InsertRequest, 

if autoID field is empty, create a field with generated IDs, then insert this field into fields_data

Proxy receive InsertRequest, decode Arrow record from record_batch bytes,

if autoID field is empty, re-create Arrow record filled with generated IDs

3change column-based data to row-based datadon't need any more
, can use Arrow array.NewSlice() to get row-based data
4based on hash_key, save row_data into internal.InsertRequest, encapsulate it into InsertMsg, then send to pulsar
based on hash_key,
encapsulate
save row
-based data
_data (get via Arrow array.NewSlice) into internal.InsertRequest, encapsulate it into InsertMsg, then send to pulsar
5Datanode receive InsertMsg, 


Search Data Flow

Test Plan(required)

...