You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 33 Next »

Current state: ["Under Discussion"]

ISSUE: #17599

PRs: 

Keywords: search, range search, radius, low bound, high bound

Released: 

Summary(required)

At present, the behavior of "search" in Milvus is to return TOPK most similar results for each vector be searched.

The purpose of this MEP is to realize "range search" function. User specifies a "range" (including radius low bound, or radius high bound, or both), Milvus does "range search" to get TOPK results with distance falling in this "range".

  • only set radius low bound, return TOPK results with "low_bound <= distance"
  • only set radius high bound, return TOPK results with "distance < high_bound". (These TOPK results returned by range search will be identical with the TOPK results returned by search.)
  • set both radius low bound and high bound, return TOPK results with "low_bound <= distance < high_bound"

Motivation(required)

User requests Milvus to realize a function to get all results with distance falling in a "range". With this capability, user can get arbitrary TOPK search results, while currently search can mostly return 16384 results.

Public Interfaces(optional)

  • No interface change in Milvus and all SDKs

We reuse the SDK interface Search() to do "range search". We only need add 2 more parameters "radius_low_bound" and "radius_high_bound" into params.

When "radius_low_bound" or "radius_high_bound" is specified, Milvus does range search; otherwise, Milvus does search.

As shown in the following figure, set "radius_low_bound": 1.0, "radius_high_bound": 2.0 in search_ params.params.

  default_index = {"index_type": "HNSW", "params":{"M": 48, "efConstruction": 500}, "metric_type": "L2"}
  collection.create_index("float_vector", default_index)
  collection.load()  
  search_params = {"metric_type": "L2", "params": {"ef": 32, "radius_low_bound": 1.0, "radius_high_bound": 2.0}}
  res = collection.search(vectors[:nq], "float_vector", search_params, limit, "int64 >= 0")


  • No interface change in knowhere

Knowhere reuses interface Query() for range search.

When "radius_low_bound" or "radius_high_bound" is specified in config, knowhere does range search; otherwise, knowhere does search.

  virtual DatasetPtr
  Query(const DatasetPtr& dataset, const Config& config, const faiss::BitsetView bitset);

Design Details(required)

Range search completely reuses the call stack from SDK to segcore, then to knowhere.

Range search only needs to put the parameter "radius_low_bound" and "radius_high_bound" into the search parameter "param" on the SDK side.


In Knowhere, range search reuses interface Query() with search. When "radius_low_bound" or "radius_high_bound" is set, range search is called; otherwise, search is called. 

The output of range search is exactly as same as the result of search.

  • seg_offsets_: with length "nq * topk", -1 is filled in when no enough result data
  • distances_: whth lengh "nq * topk", float_max (for "L2") or float_min (for "IP") is filled in when no enough result data

Compatibility, Deprecation, and Migration Plan(optional)

There is no interface change, so there is no compatibility issue.

Test Plan(required)

We can reuse search testcases to test range search.

  1. do search, get result1
  2. set "radius_high_bound"
  3. do search again, get range search result2
  4. if "radius_high_bound" is too small, result2 should be a subset of result1; otherwise, result2 should be identical with result1


There is no public data set for range search. I have created one range search data set based on sift1M.

You can find them in NAS:

test/milvus/ann_hdf5/sift-128-euclidean-range.hdf5

test/milvus/ann_hdf5/glove-200-angular-range.hdf5

test/milvus/ann_hdf5/binary/sift-4096-hamming-range.hdf5

Rejected Alternatives(optional)

If there are alternative ways of accomplishing the same thing, what were they? The purpose of this section is to motivate why the design is the way it is and not some other ways.


References(optional)

Milvus Search Flow


Segcore Search Flow




  • No labels