Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There is no compatibility issue.

Test Plan(required)

For Knowhere

  1. Add new unittest
  2. Add benchmark using range search dataset

There is no public data set for range search. I have created range search data set based on sift1M and glove200.

You can find them in NAS:

test/milvus/ann_hdf5/sift-128-euclidean-range.hdf5

test/milvus/ann_hdf5/glove-200-angular-range.hdf5

test/milvus/ann_hdf5/binary/sift-4096-hamming-range.hdf5


For Milvus

  1. add unittest in segcore
  2. use sift1M/glove200
  3. reuse search testcases to test range search
  4. use sift1M dataset to test range search, we expect to get identical result as search

...

The project implementation of the previous proposal will be much more complicated than current proposal.

Cons:

  1. Knowhere will add new API QueryByRange() and parameter legacy check API CheckRangeSearch()
  2. Because the output format of range search is not unified with search, we need add new structure SubRangeSearchResult for range search
  3. Because range search returns all result (not topk), we need Need implement Merge operation for chunk, segment, query node and proxy
  4. Every Merge operation will make the result size more bigger, system memory usage cannot be estimated, the GRPC bandwidth between server and client cannot be estimated
  5. TOPK is a required parameter in SDK, but not used by range search, it will make user confused
  6. Memory explosion caused by Merge
  7. Many API modification caused by invalid topk parameterTOPK is also a required parameter in many search related APIs, in this proposal, we have to change all these APIs to support without TOPK

Current proposal is better than the previous one in all respects.

...