Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

PROBLEM: There seems no advantages compared comparing with current implementation.

...

Summarize some limitations in the use of arrowArrow:1.

  1. Arrow data can only be serialized and deserialized

...

2. Recordbatch does not support copying data in behavioral units

3. The recordbatch must be re created at the receiving end of the pulsar

The same problem will be encountered in the query data process:

1. The query results obtained by segcore need to be reduced twice. Once, querynode merges the searchresults of multiple segments, and the other time, proxy merges the query results of multiple querynodes. If the query results are in recordbatch format, it is not convenient to reduce because data cannot be copied by line

2. Querynode needs to send the SearchResult to the proxy through pulsar. After receiving the data, the proxy needs to rebuild the recordbatch, which violates the original design intention of arrow zero copy

So I don't think arrow is suitable for the application scenario of Milvus

举个例子说明如果用 Arrow 会遇到的问题

...

  1. by unit of RecordBatch
  2. Cannot copy out row data from RecordBatch
  3. RecordBatch must be regenerated after sending via pulsar


Arrow is suitable for data analysis scenario (data is sealed and read only).

In Milvus, we need do data split and concatenate. Arrow is not a good choice for Milvus.

按行拆分是基于 2 个原因:

...

1. Arrow 数据只能以 RecordBatch 为单位进行序列化和反序列化

2. RecordBatch 不支持以行为单位进行数据拷贝3. 在 Pulsar 的接收端必须重新创建 RecordBatch在查询数据流程会遇到同样的问题:1. segcore 得到的查询结果需要做 2 次 reduce,1 次是 querynode 对多个 segment 的 SearchResult 做归并,另 1 次是 proxy 对多个 querynode 的查询结果做归并,如果查询结果是 RecordBatch 格式,因为无法按行 copy 数据,所以不方便做 reduce 操作2. querynode 需要把 SearchResult 通过 Pulsar 发送到 Proxy,Proxy 在接收到数据后需要重建 RecordBatch,违反 Arrow zero-copy 的设计初衷所以我觉得 Arrow 并不适合 Milvus 这种应用场景


Design Details(required)

We divide this MEP into 2 stages, all compatibility changes will be achieved in Stage 1 before Milvus 2.0.0, other internal changes can be left later.

...