Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Released: 

Summary(required)

By now, doing the behavior of  "search" in Milvus will return is returning TOPK most similar sorted results for each queried vector being searched.

This MEP is about to realize a functionality named "range search". User specifies a range scope -- including radius low bound and radius high bound, Milvus does "range search", and also returns TOPK sorted results with distance falling in this range scope.

  • both radius low bound and radius high bound MUST be setrange search will return TOPK results with
    for IP:
  • falling in range scope means
Metric typeBehavior
IPradius_low_bound < distance <= radius_high_bound

...

L2

...

and other metric types

...

radius_low_bound <= distance < radius_high_bound

Motivation(required)

Many users request a the "range search" functionality. With this capability, user can

...

If call range search API with unsupported index types or unsupported metric types, an exception will be thrown out.

In Knowhere, only one parameter two new parameters "radius" is _low_bound" and "radius_high_bound" are passed in via config, and range search will return all un-sorted results with distance "better" than this radius.

...

falling in this range scope.

Metric typeBehavior
IPradius_low_bound < distance <= radius_high_bound
L2 or other metric types

...

radius_low_bound <= distance < radius_high_bound

We add 3 new APIs CheckRangeSearch(), QueryByRange() and BruteForce::RangeSearch() to support range search, these APIs are already available since knowhere-v1.3.0.

  1. CheckRangeSearch()

This API is used to do range search parameter legacy check. It will throw exception when parameter legacy check fail.

...

2. QueryByRange()

This API does range search for index, it returns all unsorted results with distance "better than radius" (for IP: > radius; for others: < radius)falling in the specified range scope.

PROTO
virtual DatasetPtr
QueryByRange(const DatasetPtr& dataset, const Config& config, const faiss::BitsetView bitset)

INPUT

Dataset {
    knowhere::meta::TENSOR: -   // query data
    knowhere::meta::ROWS: -      // rows of queries
    knowhere::meta::DIM: -          // dimension
}

Config {

    knowhere::meta::RADIUS_LOW_BOUND: -   

    knowhere::meta::RADIUS_HIGH_BOUND: -      // radius for range search

}

OUTPUT

Dataset {
    knowhere::meta::IDS: -                // result IDs with length LIMS[nq]
    knowhere::meta::DISTANCE: -  // result DISTANCES with length LIMS[nq]
    knowhere::meta::LIMS: -            // result offset prefix sum with length nq + 1
}

...

The range search result for each query vector is: IDS[lims[n], lims[n+1]) and DISTANCE[lims[n], lims[n+1])


The memory used for IDS, DISTANCE and LIMS are allocated in Knowhere, they will be auto-freed when Dataset deconstruction.

...

PROTO
static DatasetPtr
RangeSearch(const DatasetPtr base_dataset,
const DatasetPtr query_dataset,
const Config& config,
const faiss::BitsetView bitset);

INPUT

Dataset {
    knowhere::meta::TENSOR: -   // base data
    knowhere::meta::ROWS: -      // rows of base data
    knowhere::meta::DIM: -          // dimension
}

Dataset {
    knowhere::meta::TENSOR: -   // query data
    knowhere::meta::ROWS: -      // rows of queries
    knowhere::meta::DIM: -          // dimension
}

Config {

    knowhere::meta::DIM::RADIUS_LOW_BOUND: -          // dimension
}

Config {

    knowhere::meta::RADIUS_HIGH_BOUND: -   // radius for range search

}

OUTPUT

Dataset {
    knowhere::meta::IDS: -                // result IDs with length LIMS[nq]
    knowhere::meta::DISTANCE: -  // result DISTANCES with length LIMS[nq]
    knowhere::meta::LIMS: -            // result offset prefix sum with length nq + 1
}

...

Segcore search flow will be updated as this flow chart, range search related change is marked RED.

In Segcore , it uses radius parameter's existence to decide to run range search or search. When to decide whether to run search, or to run range search.

  • when both "radius_low_bound" and "radius_high_bound" are set, range search is called; 
  • when only one of "radius_low_bound" and "radius_high_bound"

...

  • is set,

...

  • exception is

...

  • thrown out;
  • otherwise, search is called. 

For API query::SearchOnSealedIndex() and BruteForceSearch(), they do like following:

  1. pass in correct radius parameter to knowhere (For IP: parameters (radius_low_bound ; for others: / radius_high_bound) to knowhere 
  2. get all unsorted range search result from knowhere
  3. for each NQ's results, do sort, then filter using another radius parameterheap-sort
  4. return re-generate result Dataset with TOPK results

...