You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 36 Next »

Current state: ["Under Discussion"]

ISSUE: #17599

PRs: 

Keywords: search, range search, radius, low bound, high bound

Released: 

Summary(required)

At present, the behavior of "search" in Milvus is to return TOPK most similar results for each vector be searched.

The purpose of this MEP is to realize "range search" functionality. User specifies a range scope -- including radius low bound and radius high bound, Milvus does "range search" to get TOPK results with distance falling in this range.

  • only set radius low bound, return TOPK results with "low_bound <= distance"
  • only set radius high bound, return TOPK results with "distance < high_bound". (These TOPK results returned by range search will be identical with the TOPK results returned by search.)
  • set both radius low bound and high bound, return TOPK results with "low_bound <= distance < high_bound"

Motivation(required)

User requests Milvus to realize a function to get all results with distance falling in a "range". With this capability, user can get arbitrary TOPK search results, while currently search can mostly return 16384 results.

Public Interfaces(optional)

  • No interface change in Milvus and all SDKs

We reuse the SDK interface Search() to do "range search". We only need add 2 more parameters "radius_low_bound" and "radius_high_bound" into params.

When "radius_low_bound" or "radius_high_bound" is specified, Milvus does range search; otherwise, Milvus does search.

As shown in the following figure, set "radius_low_bound": 1.0, "radius_high_bound": 2.0 in search_ params.params.

  default_index = {"index_type": "HNSW", "params":{"M": 48, "efConstruction": 500}, "metric_type": "L2"}
  collection.create_index("float_vector", default_index)
  collection.load()  
  search_params = {"metric_type": "L2", "params": {"ef": 32, "radius_low_bound": 1.0, "radius_high_bound": 2.0}}
  res = collection.search(vectors[:nq], "float_vector", search_params, limit, "int64 >= 0")


  • No interface change in knowhere

Knowhere reuses interface Query() for range search.

When "radius_low_bound" or "radius_high_bound" is specified in config, knowhere does range search; otherwise, knowhere does search.

  virtual DatasetPtr
  Query(const DatasetPtr& dataset, const Config& config, const faiss::BitsetView bitset);

Design Details(required)

Range search completely reuses the call stack from SDK to segcore, then to knowhere.

Range search only needs to put the parameter "radius_low_bound" and "radius_high_bound" into the search parameter "param" on the SDK side.


In Knowhere, range search reuses interface Query() with search. When "radius_low_bound" or "radius_high_bound" is set, range search is called; otherwise, search is called. 

The output of range search is exactly as same as the result of search.

  • seg_offsets_: with length "nq * topk", -1 is filled in when no enough result data
  • distances_: whth lengh "nq * topk", float_max (for "L2") or float_min (for "IP") is filled in when no enough result data

Compatibility, Deprecation, and Migration Plan(optional)

There is no interface change, so there is no compatibility issue.

Test Plan(required)

We can reuse search testcases to test range search.

  1. do search, get result1
  2. set "radius_high_bound"
  3. do search again, get range search result2
  4. if "radius_high_bound" is too small, result2 should be a subset of result1; otherwise, result2 should be identical with result1


There is no public data set for range search. I have created one range search data set based on sift1M.

You can find them in NAS:

test/milvus/ann_hdf5/sift-128-euclidean-range.hdf5

test/milvus/ann_hdf5/glove-200-angular-range.hdf5

test/milvus/ann_hdf5/binary/sift-4096-hamming-range.hdf5

Rejected Alternatives(optional)

The previous proposal of this MEP is let range search return all results with distances better than a "radius".

The project implementation of the previous proposal will be much more complicated than current proposal.

Cons:

  1. Knowhere will add new API QueryByRange() and parameter legacy check API CheckRangeSearch()
  2. Because the output format of range search is not unified with search, we need add new structure SubRangeSearchResult for range search
  3. Because range search returns all result (not topk), we need implement Merge operation for chunk, segment, query node and proxy
  4. every Merge operation will make the result size more bigger, system memory usage cannot be estimated, the GRPC bandwidth between server and client cannot be estimated
  5. TOPK is a required parameter in SDK, but not used by range search, it will make user confused
  6. TOPK is also a required parameter in many search related APIs, in this proposal, we have to change all these APIs to support without TOPK

Current proposal is better than the previous one in all respects.

References(optional)

Milvus Search Flow


Segcore Search Flow




  • No labels