...
There is one proposal that not to add new interface QueryByRange(), but reusing interface Query() for range search to minimize Knowhere's public interface.
Considering the implementation in Knowhere, we don't accept prefer not to adopt this proposal, because not all types of index support range search.
By now, Knowhere totally can support 13 types of index, but only 8 of them can support range search (the cell filled with BLUE):
...
If add new interface QueryByRange(), we can add following code a virtual QueryByRange() into "VecIndex.h". If QueryByRange() is called for an index type which does not support range search, a knowhere exception will be thrown out.
Code Block | ||
---|---|---|
| ||
virtual DatasetPtr QueryByRange(const DatasetPtr& dataset, const Config& config, const faiss::BitsetView bitset) { KNOWHERE_THROW_MSG("QueryByRange not supported yet"); } |
If But if we reuse interface Query() for range search, then the implementation of Query() for all index types will be changed like this:
Code Block | ||
---|---|---|
| ||
virtual DatasetPtr Query(const DatasetPtr& dataset, const Config& config, const faiss::BitsetView bitset) { if (radius exist in config) { // do range search } else { // do query } } |
This is not a good code design for knowhereKnowhere.
Design Details(required)
...
There is another MEP 34 -- IDMAP/BinaryIDMAP Enhancement describing this proposal.
查询结果的处理
Query 返回的查询结果有2种,一种为 SubSearchResult,用于存放 segment 中每个 chunk 的查询结果。
需要添加成员变量 "is_range_search_" 和 "lims_",以及成员函数 "merge_range()" 和 "merge_range_impl()"。
Result Handling
There are two kinds of query results returned by Query(). One is SubSearchResult, which is used to store the query results of each chunk in the segment. The other is SearchResult, which is used to store the final result of one segment search.
The main difference of result handling between Query() and QueryByRange() is "merge". For Query(), after merge two "nq * topk" results, we still get a "nq * topk" result. For QueryByRange(), when merge A with "n0" results with B with "n1" results, we get "n0 + n1" results.
So we need write new merge method for both SubSearchResult and SearchResult.
Code Block | ||
---|---|---|
| ||
//=================================================================================================
class SubSearchResult {
... ... | ||
Code Block | ||
| ||
class SubSearchResult { ... ... static SubSearchResult merge(const SubSearchResult& left, const SubSearchResult& right); void merge(const SubSearchResult& sub_result); static SubSearchResult merge_range(const SubSearchResult& left, const SubSearchResult& right); // <==== new added void merge_range(const SubSearchResult& sub_result); // <==== new added private: template <bool is_desc> void merge_impl(const SubSearchResult& sub_result); void merge_range_impl(const SubSearchResult& sub_result); // <==== new added private: bool is_range_search_; // <==== new added int64_t num_queries_; int64_t topk_; int64_t round_decimal_; MetricType metric_type_;... ... std::vector<int64_t> seg_offsets_; std::vector<float> distances_; std::vector<size_t> lims_; // <==== new added |
另一种为 SearchResult,用于存放该 segment 的查询结果。
需要添加成员变量 "is_range_search_" 和 "lims_"。
Code Block | ||
---|---|---|
| ||
//================================================================================================= struct SearchResult { ... ... public: int64_t total_nq_; int64_t unity_topK_; void* segment_; // first fill data during search, and then update data after reducing search results bool is_range_search_; // <==== new added... ... std::vector<float> distances_; std::vector<int64_t> seg_offsets_; std::vector<size_t> lims_; // <==== new added ... ... }; |
...