Current state: ["Under Discussion"]
ISSUE: #17754
PRs:
Keywords: IDMAP, BinaryIDMAP, brute force search, chunk
Released: Milvus-v2.2.0
In this MEP, we put forward an IDMAP/BinaryIDMAP Enhancement proposal that let knowhere index type IDMAP/BinaryIDMAP to hold an external vector data pointer instead of adding real vector data in.
This Enhanced IDMAP/BinaryIDMAP can be used for growing segment searching to improve code reuse and reduce code maintenance effort.
Generally no one will create IDMAP/BinaryIDMAP index type for sealed segment, because it does not bring any search performance improvement but consumes identical size of memory and disk.
The only reasonable use scenario for IDMAP/BinaryIDMAP is for growing segment. But if create an IDMAP/BinaryIDMAP index in a normal way, it will consume lots of resources, because it will involve index node (to create index file), data node (to save index file to S3) and rootcoord / indexcoord / datacoord (to coordinate all these operations).
So Milvus uses following 2 strategies for growing segment searching:
The advantage of this solution is resource saving, except query node, no other nodes will be involved in; while the shortcoming is code duplication.
See following "Search Flow" chart, `FloatSearchBruteForce` and `BinarySearchBruteForce` are copied from knowhere::IDMAP/BinaryIDMAP's interface Query() and modified a little. This will introduce more code maintenance effort. And when realize new feature on IDMAP/BinaryIDMAP in Knowhere, such as range search, we have to also copy these codes implementation to Milvus.
If we enhance IDMAP/BinaryIDMAP, not to add real vector data in, but only hold an external vector data pointer in the index, we can use knowhere::IDMAP/BinaryIDMAP's interface Query() directly without any costs. User need guarantee that the memory is contiguous and safe.
In this way:
Advantages: Little code change
Cons: Need add new interfaces in both Faiss and Knowhere
Knowhere need add some unittests to test new interface `AddExWithoutIds()`.
No extra testcases need be added in Milvus because current growing segment search testcases can cover this change.
Search result and performance will be identical with before.
Briefly list all references