You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 13 Next »

Current state: ["Under Discussion"]

ISSUE: #17754

PRs: 

Keywords: IDMAP, BinaryIDMAP, brute force search, chunk

Released: Milvus-v2.2.0

Summary(required)

In this MEP, we put forward an IDMAP/BinaryIDMAP Enhancement proposal that let knowhere index type IDMAP/BinaryIDMAP to hold an external vector data pointer instead of adding real vector data in.

This Enhanced IDMAP/BinaryIDMAP can be used for growing segment searching to improve code reuse and reduce code maintenance effort.

Motivation(required)

Generally no one will create IDMAP/BinaryIDMAP index type for sealed segment, because it does not bring any search performance improvement but consumes identical size of memory and disk.

The only reasonable use scenario for IDMAP/BinaryIDMAP is for growing segment. But if create an IDMAP/BinaryIDMAP index in a normal way, it will consume lots of resources, because it will involve index node (to create index file), data node (to save index file to S3) and rootcoord / indexcoord / datacoord (to coordinate all these operations). 

So Milvus uses following 2 strategies for growing segment searching:

  • small batch index for fully-filled chunks (this functionality is disabled for some particular reason)
  • brute force search for partial-filled chunks and no indexed fully-filled chunks (copied from knowhere IDMAP)

The advantage of this solution is resource saving, except query node, no other nodes will be involved in; while the shortcoming is code duplication.

See following "Search Flow" chart, `FloatSearchBruteForce` and `BinarySearchBruteForce` are copied from knowhere::IDMAP/BinaryIDMAP's interface Query() and modified a little. This will introduce more code maintenance effort. And when realize new feature on IDMAP/BinaryIDMAP in Knowhere, such as range search, we have to also copy these codes implementation to Milvus.

If we enhance IDMAP/BinaryIDMAP, not to add real vector data in, but only hold an external vector data pointer in the index, we can use knowhere::IDMAP/BinaryIDMAP's interface Query() directly without any costs. User need guarantee that the memory is contiguous and safe.

In this way:

  • no CPU time, memory and disk consumption when creating index
  • resource saving, only query node is involved in
  • no code duplication for growing segment search
  • unified search result for sealed segment and growing segment
Proposal 1 (only take IDMAP as an example)
  1. Faiss adds new field "codes_ex" and new interface "add_ex" for structure IndexFlat. In IndexFlat, "codes" and "codes_ex" are mutual exclusive, user cannot set both of them.
  2. Knowhere adds a new interface `AddExWithoutIds()` for IDMAP.
  3. In Milvus, re-write API "FloatSearchBruteForce()" and "BinarySearchBruteForce()", let they use enhanced IDMAP to search instead of calling Faiss interfaces.

Advantages: Little code change

Cons: Need add new interfaces in both Faiss and Knowhere

Proposal 2 (only take IDMAP as an example)


Public Interfaces(optional)



Design Details(required)


Compatibility, Deprecation, and Migration Plan(optional)

  • This MEP will be transparent for users, and will not introduce any compatibility issue.

Test Plan(required)

Knowhere need add some unittests to test new interface `AddExWithoutIds()`.

No extra testcases need be added in Milvus because current growing segment search testcases can cover this change.

Search result and performance will be identical with before.

References(optional)

Briefly list all references

  • No labels