Current state: ["Under Discussion"]
ISSUE: #17754
PRs:
Keywords: IDMAP, BinaryIDMAP, brute force search, chunk
Released: Milvus-v2.2.0
In this MEP, we put forward an IDMAP/BinaryIDMAP Enhancement proposal that let knowhere index type IDMAP/BinaryIDMAP to hold an external vector data pointer instead of adding real vector data in.
This Enhanced IDMAP/BinaryIDMAP can be used for growing segment searching to improve code reuse and reduce code maintenance effort.
Generally no one will create IDMAP/BinaryIDMAP index type for sealed segment, because it does not bring any search performance improvement but consumes identical size of memory and disk.
The only reasonable use scenario for IDMAP/BinaryIDMAP is for growing segment. But if create an IDMAP/BinaryIDMAP index in a normal way, it will consume lots of resources, because it will involve index node (to create index file), data node (to save index file to S3) and rootcoord / indexcoord / datacoord (to coordinate all these operations).
So Milvus uses following 2 strategies for growing segment searching:
The advantage of this solution is resource saving, except query node, no other nodes will be involved in; while the shortcoming is code duplication.
See following "Search Flow" chart, `FloatSearchBruteForce` and `BinarySearchBruteForce` are copied from knowhere::IDMAP/BinaryIDMAP's interface Query() and modified a little. This will introduce more code maintenance effort. And when realize new feature on IDMAP/BinaryIDMAP in Knowhere, such as range search, we have to also copy these codes implementation to Milvus.
If we enhance IDMAP/BinaryIDMAP, not to add real vector data in, but only hold an external vector data pointer in the index, we can use knowhere::IDMAP/BinaryIDMAP's interface Query() directly without any costs. User need guarantee that the memory is contiguous and safe.
In this way:
Reuse index IDMAP, add new interface in both knowhere and faiss.
Advantages: Little code change
Cons: Need add new interfaces in both Faiss and Knowhere
Add new index type IDMAP_EX, add new interface in faiss.
Advantages: No new interface in Knowhere
Cons: Need add new index type in Knowhere, most part of code of this index is duplicate with IDMAP
Add a new wrapper API in knowhere to call faiss brute force search for all metric types
Advantages: No code change in faiss, only one new interface in Knowhere
Cons: This change dis-obey knowhere's design concept. By now all operations in knowhere is for an index, but this API is for all metric types, not for an index.
Since both Proposal 1 and Proposal 2 need add new interface in Faiss, and Milvus team think this will break Faiss's encapsulation, we adopt Proposal 3.
Add new interface only in Knowhere
/** knowhere wrapper API to call faiss brute force search for all metric types * * @param metric_type * @param xb training vecors, size nb * dim * @param xq query vecors, size nq * dim * @param dim * @param nb rows of training vectors * @param nq rows of query vectors * @param topk * @param labels output, memory allocated and freed by caller * @param distances output, memory allocated and freed by coller * @param bitset */ void BruteForceSearch( const knowhere::MetricType& metric_type, const void* xb, const void* xq, const int64_t dim, const int64_t nb, const int64_t nq, const int64_t topk, int64_t* labels, float* distances, const faiss::BitsetView bitset); /** knowhere wrapper API to call faiss brute force range search for all metric types * * @param metric_type * @param xb training vecors, size nb * dim * @param xq query vecors, size nq * dim * @param dim * @param nb rows of training vectors * @param nq rows of query vectors * @param radius range search radius * @param labels output, memory allocated inside and freed by caller * @param distances output, memory allocated inside and freed by coller * @param lims output, memory allocated inside and freed by coller * @param bitset */ void BruteForceRangeSearch( const knowhere::MetricType& metric_type, const void* xb, const void* xq, const int64_t dim, const int64_t nb, const int64_t nq, const float radius, int64_t*& labels, float*& distances, size_t*& lims, const faiss::BitsetView bitset); |
BruteForceSearch() is wrapper API to call Faiss brute force search for all metric types. The output parameters "labels" and "distances" are pointers to the memory allocated by caller. Caller will also free these memory after use.
BruteForceRangeSearch() is wrapper API to call Faiss brute force range search for all metric types. The output parameters "labels", "distances" and "lims" are reference of pointers to the memory allocated in Faiss. This is because the length of range search result is unknown, caller cannot allocate memory in advance. Caller need free these memory after use.
For growing segment in segcore, all brute force search related code can be removed, call knowhere's BruteForceSearch() and BruteForceRangeSearch() instead.
New testcases are added in Knowhere to test "BruteForceSearch()" and "BruteForceRangeSearch()".
No extra testcases need be added in Milvus because current growing segment search testcases can cover this change.
Search result and performance will be identical with before.
Briefly list all references