Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Keywords: arrow

Released: Milvus 2.0

Summary

...

Milvus 2.0 is a cloud-native and multi-language vector database, we use gRPC and pulsar to communicate among SDK and components.

In consideration of the data size, especially when inserting or returning search results, Milvus takes considerable efforts to do serialization and deserialization.

In the field of big data, Apache Arrow has been a factor standard to solve this kind of problems.


Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast.

It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware.

The project is developing a multi-language collection of libraries for solving systems problems related to in-memory analytical data processing. This includes such topics as:

  • Zero-copy shared memory and RPC-based data movement

  • Reading and writing file formats (like CSV, Apache ORC, and Apache Parquet)

  • In-memory analytics and query processing

Motivation(required)

From data inspect, Milvus includes 2 data flows mainly:

...