Current state: Under Discussion

ISSUE: #6299

PRs: #6570

Keywords: Query / Search / Vector

Released: Milvus 2.0rc3


Summary

Using minimal memory consumption, let `search` or `query` operation support to return vector raw data in output fields.

Motivation

In Milvus 2.0rc1, operations like search or query do not support return vector raw data in output fields. This is from the consideration of memory consumption,

vector field with big dimension will occupy hundreds of times of memory comparing with scalar field. So in general load_collection or load_partition only load

scalar fields' raw data into memory. Vector fields' raw data is loaded into memory only in 3 cases:

  1. steaming segment
  2. vector field's index type is FLAT
  3. vector field's index has not been created

Design Details

type VectorFieldInfo struct {
    mu              sync.RWMutex
    fieldID         UniqueID
    fieldBinlog     *datapb.FieldBinlog
    rowDataInMemory bool
    rawData         map[string]storage.FieldData  // map[binlogPath]FieldData
}

type Segment struct {
    ... ...
    vectorFieldInfos map[UniqueID]*VectorFieldInfo
}


// load vector field's data from info.fieldBinlog, save the raw data into info.rawData
func (loader *segmentLoader) loadSegmentVectorFieldData(info *VectorFieldInfo) error {


// For vector output fields, load raw data from fieldBinlog if needed,
// get vector raw data via result.Offset from *VectorfieldInfo, then
// fill vector raw data into result
func (q *queryCollection) fillVectorFieldsData(segment *Segment, result *segcorepb.RetrieveResults) error



Original vector data storage public interface and struct

Public Interfaces

```go
type FileManager interface {
GetFile(path string) (string, error)
PutFile(path string, content []byte) error
Exist(path string) bool
ReadFile(path string) []byte
}
```

A VectorFileManager implements FileManager interface.

```go
type VectorFileManager struct {
localFileManager FileManager
remoteFileManager FileManager
insertCodec *InsertCodec
}
```

localFileManager is responsible to local file manager. And can be implements with golang os library.
remoteFileManager is responsible for cloud storage or remote server storage, and will be implemented with minio client now.

When the offset of vector is obtained, we can get origin vector data from local vector data file.



Test Plan

Do query / search (with vector field in output fields) in all kinds of combinations of following scenarios, check the correctness of result.

  1. float vector or binary vector
  2. with/wo index
  3. all kinds of index type