Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Released: 

Summary(required)

At present, the behavior of By now, doing "search" in Milvus is to will return TOPK most similar results for each vector be being searched.

This MEP is about how to realize a functionality named "range search" functionality. User specifies a range scope -- including radius low bound and radius high bound, Milvus will do does "range search", and return also returns TOPK sorted results with distance falling in this range scope.

  • both radius low bound and radius high bound MUST be set
  • range search will return TOPK results with
    • for L2: low_bound <= distance < high_bound
    • for IP: low_bound < distance <= high_bound

...

  • get results with distance falling in a range scope
  • get arbitrary TOPK search results more than 16384

...

We reuse the SDK interface Search() to do "range search". Add Only add 2 more parameters "radius_low_bound" and "radius_high_bound" into params.

...

Code Block
languagepy
  default_index = {"index_type": "HNSW", "params":{"M": 48, "efConstruction": 500}, "metric_type": "L2"}
  collection.create_index("float_vector", default_index)
  collection.load()  
  search_params = {"metric_type": "L2", "params": {"ef": 32, "radius_low_bound": 1.0, "radius_high_bound": 2.0}}
  res = collection.search(vectors[:nq], "float_vector", search_params, limit, "int64 >= 0")


  • Add 3 new interfaces in knowhere
Code Block
languagecpp
  // range search parameter legacy check
  virtual bool
  CheckRangeSearch(Config& cfg, const IndexType type, const IndexMode mode);
  
  // range search
  virtual DatasetPtr
  QueryByRange(const DatasetPtr& dataset, const Config& config, const faiss::BitsetView bitset);

  // brute force range search
  static DatasetPtr
  BruteForce::RangeSearch(const DatasetPtr base_dataset, const DatasetPtr query_dataset, const Config& config, const faiss::BitsetView bitset);

Design Details(required)

In Knowhere

Range search related new APIs CheckRangeSearch(), QueryByRange() and BruteForce::RangeSearch() are available since knowhere-v1.3.0.

Range search completely reuses the call stack from SDK to segcore.

...

  • seg_offsets_: with length "nq * topk", -1 is filled in when no enough result data
  • distances_: with lengh "nq * topk", data is undefined when no enough result data

...



Compatibility, Deprecation, and Migration Plan(optional)

...

  1. Add new unittest
  2. Add benchmark using range search dataset

There is no public data set for range search. I have created range search data set based on sift1M and glove200.

You can find them in NAS:

test/milvus/ann_hdf5/sift-128-euclidean-range.hdf5

test/milvus/ann_hdf5/glove-200-angular-range.hdf5

test/milvus/ann_hdf5/binary/sift-4096-hamming-range.hdf5


For Milvus

  1. add unittest in segcore
  2. use sift1M/glove200 dataset to test range search, we expect to get identical result as search

...