Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

PRs: TBD Keywords: embedded, python, deep learning,

Released: Milvus 2.1

Motivation

According to Milvus official document, there are many ways to install and start Milvus on your machine, including:

...

Code Block
languagepy
linenumberstrue
>>> import rocksdb
>>> db = rocksdb.DB("test.db", rocksdb.Options(create_if_missing=True))
>>> db.put(b'a', b'data')
>>> print db.get(b'a')
b'data'

Summary

We introduce embedded Milvus in this MEP.
With embedded Milvus, you just need a clean environment with Python installed. You can then just do:

...

The code piece above is pretty self-explanatory.
You don't need any Milvus server pre-installed. And of course you don't need to keep any Milvus process running in the meantime.
The embedded Milvus starts and exits whenever you wish it to, but all data and logs persist.
We believe this embedded Milvus version makes Milvus a real "DB for AI" as it would make Milvus extremely easy to use for data scientists and AI engineers, etc.

Detailed Design

Milvus as a Library

Up until version 2.0, Milvus is a typical go project published as a go binary. Milvus also has a typical go code structure, as shown below:

...

Code Block
languagebash
linenumberstrue
milvus
├── cmd // Entrance of Milvus binary. (the main() function)
├── internal // Milvus code.
├── pkg
│ ├── embedded // Where embedded Milvus code lies.
│ ├── ...
├── configs
├── deployments
├── docs
├── scripts
├── ...

Running Go From Python

The goal is to use go code in Python. One option is to use CGO. We could export (//export)go code as a C library libmilvus.so and have Python to import this library:

...


The Milvus service usually takes tens of seconds to fully start (which we will optimize later). It is a good idea to keep a background thread with running Milvus who should always stand ready to answer user's calls.

External Dependencies

The Milvus standalone version used to depend on external dependencies, namely MinIO and Etcd. With the Milvus 2.1 release we will have both dependencies removed, by providing options too: (1) replacing MinIO with local disk storage and (2) Replacing Etcd server with embedded Etcd.
To disable MinIO and use local disk storage: [TBD]
To enable embedded Etcd: Toggle etcd.use.embed option ON in milvus.yaml file.

Logging

Logging should be suppressed during the embedded Milvus run, otherwise your program can easily get flooded with logs. We propose that all logs, no matter which level, should not be printed to the console.

Working with Pymilvus

Pymilvus is an essential part of Milvus and still the most popular SDK. Embedded Milvus works with Pymilvus in the following ways:

...

  1. The embedded Milvus package could be as large as ~100MBs while the pymilvus package is ~30MBs. It is too much of a burden if embedded Milvus is put into pymilvus, package size wise.
  2. We could make embedded Milvus and pymilvus completely independent. However, this would make version control (amongst Milvus, embedded Milvus, pymilvus) very complicated. Also, doing two pip install operations is not a good user experience.

Comparing Embedded Milvus and Pymilvus

Recall that with pymilvus SDK, you will need to:

...

  1. Install Pymilvus with "pip install pymilvus".
  2. Import embedded Milvus with:

    Code Block
    languagepy
    linenumberstrue
    from ... import milvus # Milvus will start.


  3. Start playing with Milvus. Again, enjoy your time.
  4. exit() if you are in Python interactive mode, otherwise your Python script should already finish gracefully.

Best Practice: Embedded Milvus versus Pymilvus

Embedded Milvus and pymilvus are really constructed for different scenarios. You may consider using embedded Milvus when:

...

  • You are to use embedded Milvus in your production environment (Please don't).
  • You have strict performance needs (embedded Milvus doesn't have the best performance).

Test Plans

  • Embedded Milvus should have some unit tests for itself.
  • It is suggested to apply a part of (if not all of) Milvus standalone tests on embedded Milvus.

Future Work

  • Milvus was not born as a library. We made embedded Milvus running in a background daemon thread. When communicating with embedded Milvus, we are still making gRPC calls. It will be great if these gRPC calls can be replaced with function calls.