Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Current state: ["Under DiscussionDeclined"]

ISSUE: #17754

PRs: #240 #243 #18277

Keywords: IDMAP, BinaryIDMAP, brute force search, chunk

...

See following "Search Flow" chart, `FloatSearchBruteForce` and `BinarySearchBruteForce` are copied from knowhere::IDMAP/BinaryIDMAP's interface Query() and modified a little. This will introduce more code maintenance effort. And when realize new feature on IDMAP/BinaryIDMAP in Knowhere, such as range search, we have to also copy these codes implementation to Milvus.

One good solution is to If we enhance IDMAP/BinaryIDMAP, not to add real vector data in, but only hold an external vector data pointer in the index, we can use knowhere::IDMAP/BinaryIDMAP's interface Query() directly without any costs. User need guarantee that the memory is contiguous and safe.

...

  • no CPU time, memory and disk consumption when creating index
  • resource saving, only query node is involved in
  • no code duplication for growing segment search
  • unified search result for sealed segment and growing segment

Public Interfaces(optional)

...

Proposal 1 (only take IDMAP as an example)

Reuse index IDMAP, add new interface in both knowhere and faiss.

  1. Faiss adds new field "codes_ex" and new interface "add_ex()" for structure IndexFlat.

...

  1. In

...

  1. IndexFlat, "

...

  1. codes" and "

...

  1. codes_ex" are mutual exclusive, user cannot set

...

  1. both of them.
  2. Knowhere adds a new interface `AddExWithoutIds()` for IDMAP.
  3. In Milvus, re-write API "FloatSearchBruteForce()", let it use enhanced IDMAP to search.

Advantages: Little code change

Cons: Need add new interfaces in both Faiss and Knowhere

Proposal 2 (only take IDMAP as an example)

Add new index type IDMAP_EX, add new interface in faiss.

  1. Faiss adds new field "codes_ex" and new interface "add_ex()" for structure IndexFlat. In IndexFlat, "codes" and "codes_ex" are

...

  1. mutual exclusive, user cannot set both of them.
  2. Knowhere adds a new index type IDMAP_EX, which use API "add_ex()" to insert vector data.
  3. In Milvus, re-write API "FloatSearchBruteForce()", let it use new IDMAP_EX to search.

Advantages: No new interface in Knowhere

Cons: Need add new index type in Knowhere, most part of code of this index is duplicate with IDMAP

Proposal 3 (only take IDMAP as an example)

Add a new wrapper API in knowhere to call faiss brute force search for all metric types

Advantages: No code change in faiss, only one new interface in Knowhere

Cons: This change dis-obey knowhere's design concept. By now all operations in knowhere is for an index, but this API is for all metric types, not for an index.


Since both Proposal 1 and Proposal 2 need add new interface in Faiss, and Milvus team think this will break Faiss's encapsulation, we adopt Proposal 3.

Public Interfaces(optional)

Add new interface only in Knowhere

Code Block
languagecpp
//============================================================================
struct IndexBinaryFlat : IndexBinaryclass BruteForce {
 public:
   /// databasestatic vectors,DatasetPtr
 size ntotal * dSearch(const / 8
    std::vector<uint8_t> xb;
DatasetPtr base_dataset,
    /// external database vectors, size ntotal * dconst / 8DatasetPtr query_dataset,
    uint8_t* xb_ex = nullptr;								// <==== new added
 const Config& config,
 ... ...
}

void IndexBinaryFlat::add_ex(idx_t n, const uint8_t* x) {
    xb_ex = (uint8_t*)x;
    ntotal = n;
}

//============================================================================
struct IndexFlatCodes : Index {
    ... ...
    /// encoded dataset, size ntotal * code_size
    std::vector<uint8_t> codes;

    // external encoded dataset , size ntotal * code_size
    uint8_t* codes_ex = nullptr;							// <==== new added
    ... ...
}

void IndexFlatCodes::add_ex(idx_t n, const float* x) {
    FAISS_THROW_IF_NOT(is_trained);
    FAISS_THROW_IF_NOT(codes.empty());
    codes_ex = (uint8_t*)x;
    ntotal = n;
}

Knowhere needs add a new interface `AddExWithoutIds()` for both IDMAP and BinaryIDMAP.

Code Block
languagecpp
    // set external data pointer instead really add data
    void
    AddExWithoutIds(const DatasetPtr&, const Config&);

When Knowhere detect that "codes_ex" is used in current IDMAP index or "xb_ex" is used in current BinaryIDMAP index, serialization is banned.

For Milvus, API "FloatSearchBruteForce()" and "BinarySearchBruteForce()" will be re-written, they can use the enhanced IDMAP/BinaryIDMAP to search instead of calling Faiss interfaces.

Design Details(required)

In Milvus, when growing segment need create an enhanced IDMAP index, it can do in this way:

Code Block
languagecpp
    auto idmap_index = std::make_shared<knowhere::IDMAP>();
    idmap_index->Train(train_dataset, conf);
    idmap_index->AddExWithoutIds(train_dataset, conf);				// <==== call ""AddExWithoutIds"
    auto result = idmap_index->Query(query_dataset, conf, bitset);
const faiss::BitsetView bitset);

    static DatasetPtr
    RangeSearch(const DatasetPtr base_dataset,
                const DatasetPtr query_dataset,
                const Config& config,
                const faiss::BitsetView bitset);
};

BruteForce::Search() is wrapper API to call Faiss brute force search for all metric types.

BruteForce::RangeSearch() is wrapper API to call Faiss brute force range search for all metric types. 

Design Details(required)

For growing segment in segcore, all brute force search related code can be removed, call knowhere's BruteForceSearch() and BruteForceRangeSearch() instead.This enhanced IDMAP index cannot be serialized, and will be auto destroyed without any cost. 

Compatibility, Deprecation, and Migration Plan(optional)

  • This MEP will be transparent for users, and will not introduce any compatibility issue.

Test Plan(required)

...

New testcases are added in Knowhere to test "BruteForceSearch()" and "BruteForceRangeSearch()".

  • in knowhere/unittest/test_bruteforce.cpp

No extra testcases need be added in Milvus because current growing segment search testcases can cover this change.

...