Current state: ["Under Discussion"]
ISSUE: https://github.com/milvus-io/milvus/issues/15711
PRs: TBD Keywords: embedded, python, deep learning,
Released: Milvus 2.1
According to Milvus official document, there are many ways to install and start Milvus on your machine, including:
We have noticed that a large number of data scientists or AI engineers in the Milvus community. Most of them work with Python on a MacBook and a lot of them think that all these ways to install Milvus are yet still too complicated. They also don't want to maintain a Milvus process (no matter in which form) in their operation systems.
The Milvus community has introduced pymilvus Python SDK in the past, but it still requires that a Milvus instance (whichever form) is already running on your machine.
For a very long time the community has been asking for a Milvus mutation that is extremely easy to use, just like pysqlite or pyrocksdb. Take pyrocksdb as an example:
>>> import rocksdb >>> db = rocksdb.DB("test.db", rocksdb.Options(create_if_missing=True)) >>> db.put(b'a', b'data') >>> print db.get(b'a') b'data' |
We introduce embedded Milvus in this MEP.
With embedded Milvus, you just need a clean environment with Python installed. You can then just do:
$ pip install pymilvus # Install pymilvus. $ python3 >>> From ... import milvus # Milvus is brought up here. >>> hello_milvus = Collection(…) >>> insert_result = hello_milvus.insert(...) >>> hello_milvus.create_index(…) >>> hello_milvus.load(...) >>> search_result = hello_milvus.search(…) hit: (distance: 0.0, id: 2998), random field: -11.0 hit: (distance: 0.11455299705266953, id: 1581), random field: -18.0 hit: (distance: 0.1232629269361496, id: 2647), random field: -13.0 hit: (distance: 0.0, id: 2999), random field: -11.0 hit: (distance: 0.10560893267393112, id: 2430), random field: -18.0 hit: (distance: 0.13938161730766296, id: 377), random field: -14.0 search latency = 0.2796s >>> exit() # Milvus is shut down, but all data lives on. |
The code piece above is pretty self-explanatory.
You don't need any Milvus server pre-installed. And of course you don't need to keep any Milvus process running in the meantime.
The embedded Milvus starts and exits whenever you wish it to, but all data and logs persist.
We believe this embedded Milvus version makes Milvus a real "DB for AI" as it would make Milvus extremely easy to use for data scientists and AI engineers, etc.
Up until version 2.0, Milvus is a typical go project published as a go binary. Milvus also has a typical go code structure, as shown below:
milvus ├── cmd // Entrance of Milvus binary. (the main() function) ├── internal // Milvus code. ├── configs ├── deployments ├── docs ├── scripts ├── ... |
The first thing an embedded Milvus needs is to publish Milvus as a library.
Following the go code structure convention, we just need to add another layer to Milvus named pkg where we can export Milvus as a library:
milvus ├── cmd // Entrance of Milvus binary. (the main() function) ├── internal // Milvus code. ├── pkg │ ├── embedded // Where embedded Milvus code lies. │ ├── ... ├── configs ├── deployments ├── docs ├── scripts ├── ... |
The goal is to use go code in Python. One option is to use CGO. We could export (//export)go code as a C library libmilvus.so
and have Python to import this library:
milvus: build-cpp print-build-info @echo "Building Milvus ..." @mkdir -p $(INSTALL_PATH) && go env -w CGO_ENABLED="1" && GO111MODULE=on $(GO) build \ -ldflags="-X 'main.BuildTags=$(BUILD_TAGS)' -X 'main.BuildTime=$(BUILD_TIME)' -X 'main.GitCommit=$(GIT_COMMIT)' -X 'main.GoVersion=$(GO_VERSION)'" \ -buildmode=c-shared-o $(INSTALL_PATH)/libmilvus.so $(PWD)/pkg/embedded/embeddedmilvus.go 1>/dev/null |
And in Python we could:
import ctypes libmilvus = ctypes.cdll.LoadLibrary('./bin/libmilvus.so') embedded_milvus = libmilvus.embedded_milvus() embedded_milvus() # Milvus main() equivalent. |
The Milvus service usually takes tens of seconds to fully start (which we will optimize later). It is a good idea to keep a background thread with running Milvus who should always stand ready to answer user's calls.
The Milvus standalone version used to depend on external dependencies, namely MinIO and Etcd. With the Milvus 2.1 release we will have both dependencies removed, by providing options too: (1) replacing MinIO with local disk storage and (2) Replacing Etcd server with embedded Etcd.
To disable MinIO and use local disk storage: [TBD]
To enable embedded Etcd: Toggle etcd.use.embed
option ON in milvus.yaml
file.
Logging should be suppressed during the embedded Milvus run, otherwise your program can easily get flooded with logs. We propose that all logs, no matter which level, should not be printed to the console.
Pymilvus is an essential part of Milvus and still the most popular SDK. Embedded Milvus works with Pymilvus in the following ways:
milvus
and will be published to The Python Package Index (PyPI).We believe that constructing embedded Milvus as a separate PyPI package has these advantages:
pip install
operations is not a good user experience.Recall that with pymilvus SDK, you will need to:
docker-compose up
.connections.connect("default", host="localhost", port="19530") |
docker-compose down
.
However, with the embedded Milvus embedded, all you need to do is:
Import embedded Milvus with:
from ... import milvus # Milvus will start. |
exit()
if you are in Python interactive mode, otherwise your Python script should already finish gracefully.Embedded Milvus and pymilvus are really constructed for different scenarios. You may consider using embedded Milvus when:
It is suggested that you should NOT use embedded Milvus if:
Embedded Milvus is not designed for production environment deployment. However, we will guarantee that everything you did in embedded Milvus can "run everywhere".
Anything thing you have done in embedded Milvus, all the scripts and code you have written, can be 100% migrated and runnable on other Milvus instances, no matter what form, including Milvus standalone, Milvus cluster and cloud-native Milvus.