Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There is one proposal that not to add new interface QueryByRange(), but reusing interface Query() for range search to minimize Knowhere's public interface.

Considering the implementation in Knowhere, we don't accept prefer not to adopt this proposal, because not all types of index support range search.

By now, Knowhere totally can support 13 types of index, but only 8 of them can support range search (the cell filled with BLUE):

...

If add new interface QueryByRange(),  we can add following code a virtual QueryByRange() into "VecIndex.h". If QueryByRange() is called for an index type which does not support range search, a knowhere exception will be thrown out.

Code Block
languagecpp
virtual DatasetPtr
QueryByRange(const DatasetPtr& dataset, const Config& config, const faiss::BitsetView bitset) {
    KNOWHERE_THROW_MSG("QueryByRange not supported yet");
}

If But if we reuse interface Query() for range search, then the implementation of Query() for all index types will be changed like this:

Code Block
languagecpp
virtual DatasetPtr
Query(const DatasetPtr& dataset, const Config& config, const faiss::BitsetView bitset) {
    if (radius exist in config) {
        // do range search
    } else {
        // do query
    }
}

This is not a good code design for knowhereKnowhere.

Design Details(required)

...

There is another MEP 34 -- IDMAP/BinaryIDMAP Enhancement describing this proposal.

查询结果的处理

Query 返回的查询结果有2种,一种为 SubSearchResult,用于存放 segment 中每个 chunk 的查询结果。

需要添加成员变量 "is_range_search_" 和 "lims_",以及成员函数 "merge_range()" 和 "merge_range_impl()"。

Result Handling

There are two kinds of query results returned by Query(). One is SubSearchResult, which is used to store the query results of each chunk in the segment. The other is SearchResult, which is used to store the final result of one segment search.

The main difference of result handling between Query() and QueryByRange() is "merge". For Query(), after merge two "nq * topk" results, we still get a "nq * topk" result. For QueryByRange(), when merge A with "n0" results with B with "n1" results, we get "n0 + n1" results.

So we need write new merge method for both SubSearchResult and SearchResult.

Code Block
languagecpp
//=================================================================================================
class SubSearchResult {
    ... ...
Code Block
languagecpp
class SubSearchResult {
    ... ...
    static SubSearchResult
    merge(const SubSearchResult& left, const SubSearchResult& right);

    void
    merge(const SubSearchResult& sub_result);

    static SubSearchResult
    merge_range(const SubSearchResult& left, const SubSearchResult& right);		// <==== new added

    void
    merge_range(const SubSearchResult& sub_result);								// <==== new added

 private:
    template <bool is_desc>
    void
    merge_impl(const SubSearchResult& sub_result);

    void
    merge_range_impl(const SubSearchResult& sub_result);						// <==== new added

 private:
    bool is_range_search_;				// <==== new added
    int64_t num_queries_;
    int64_t topk_;
    int64_t round_decimal_;
    MetricType metric_type_;... ...
    std::vector<int64_t> seg_offsets_;
    std::vector<float> distances_;
    std::vector<size_t> lims_;			// <==== new added

另一种为 SearchResult,用于存放该 segment 的查询结果。

需要添加成员变量 "is_range_search_" 和 "lims_"。

Code Block
languagecpp


//=================================================================================================
struct SearchResult {
    ... ...
 public:
    int64_t total_nq_;
    int64_t unity_topK_;
    void* segment_;

    // first fill data during search, and then update data after reducing search results
	bool is_range_search_;				// <==== new added... ...
    std::vector<float> distances_;
    std::vector<int64_t> seg_offsets_;
    std::vector<size_t> lims_;			// <==== new added
    ... ...
};

...