You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 41 Next »

Current state: ["Under Discussion"]

ISSUE: #17599

PRs: 

Keywords: search, range search, radius, low bound, high bound

Released: 

Summary(required)

At present, the behavior of "search" in Milvus is to return TOPK most similar results for each vector be searched.

This MEP is about how to realize "range search" functionality. User specifies a range scope -- including radius low bound and radius high bound, Milvus will do "range search", and return TOPK sorted results with distance falling in this range.

  • both radius low bound and radius high bound MUST be set
  • range search will return TOPK results with
    • for L2: low_bound <= distance < high_bound
    • for IP: low_bound < distance <= high_bound

Motivation(required)

Many users request Milvus to realize a "range search" functionality. With this capability, user can

  • get results with distance falling in a range scope
  • get arbitrary TOPK search results more than 16384

Public Interfaces(optional)

  • No interface change in Milvus and all SDKs

We reuse the SDK interface Search() to do "range search". Add 2 more parameters "radius_low_bound" and "radius_high_bound" into params.

When "radius_low_bound" and "radius_high_bound" are specified, Milvus does range search; otherwise, Milvus does search.

As shown in the following figure, set "radius_low_bound": 1.0, "radius_high_bound": 2.0 in search_ params.params.

  default_index = {"index_type": "HNSW", "params":{"M": 48, "efConstruction": 500}, "metric_type": "L2"}
  collection.create_index("float_vector", default_index)
  collection.load()  
  search_params = {"metric_type": "L2", "params": {"ef": 32, "radius_low_bound": 1.0, "radius_high_bound": 2.0}}
  res = collection.search(vectors[:nq], "float_vector", search_params, limit, "int64 >= 0")


  • Add 3 new interfaces in knowhere
  // range search parameter legacy check
  virtual bool
  CheckRangeSearch(Config& cfg, const IndexType type, const IndexMode mode);
  
  // range search
  virtual DatasetPtr
  QueryByRange(const DatasetPtr& dataset, const Config& config, const faiss::BitsetView bitset);

  // brute force range search
  static DatasetPtr
  BruteForce::RangeSearch(const DatasetPtr base_dataset, const DatasetPtr query_dataset, const Config& config, const faiss::BitsetView bitset);

Design Details(required)

Range search completely reuses the call stack from SDK to segcore.



Segcore search flow will be updated as this flow chart, range search related change is marked RED.

In API query::SearchOnSealedIndex() and BruteForceSearch(), when "radius_low_bound" and "radius_high_bound" are set, range search is called; otherwise, search is called. 

For API query::SearchOnSealedIndex(), the result returned by knowhere::QueryByRange() will be sorted and filtered, only results with distance falling in the range scope will be returned.

The same as API BruteForceSearch(), the result returned by knowhere::BruteForce::RangeSearch() will be sorted and filtered, only results with distance falling in the range scope will be returned.

The output of range search is exactly as same as the result of search.

  • seg_offsets_: with length "nq * topk", -1 is filled in when no enough result data
  • distances_: with lengh "nq * topk", data is undefined when no enough result data


Range search related new APIs CheckRangeSearch(), QueryByRange() and BruteForce::RangeSearch() are available since knowhere-v1.3.0.



Compatibility, Deprecation, and Migration Plan(optional)

There is no compatibility issue.

Test Plan(required)


  1. reuse search testcases to test range search
  2. use sift1M dataset to test range search, we expect to get identical result as search

Rejected Alternatives(optional)

The previous proposal of this MEP is let range search return all results with distances better than a "radius".

The project implementation of the previous proposal will be much more complicated than current proposal.

Cons:

  1. Knowhere will add new API QueryByRange() and parameter legacy check API CheckRangeSearch()
  2. Because the output format of range search is not unified with search, we need add new structure SubRangeSearchResult for range search
  3. Because range search returns all result (not topk), we need implement Merge operation for chunk, segment, query node and proxy
  4. Every Merge operation will make the result size more bigger, system memory usage cannot be estimated, the GRPC bandwidth between server and client cannot be estimated
  5. TOPK is a required parameter in SDK, but not used by range search, it will make user confused
  6. TOPK is also a required parameter in many search related APIs, in this proposal, we have to change all these APIs to support without TOPK

Current proposal is better than the previous one in all respects.

References(optional)









  • No labels