Current state: Accepted
ISSUE: #6299
Keywords: Query / Search / Vector
Released: Milvus 2.0rc3
This project is to use minimal memory, let query support to return vector field in output.
In Milvus 2.0rc1, operations like query do not support return vector field in output. Proxy will check query request's output fields, if find their data types contain
float vector or binary vector, proxy error out. This is for the consideration of memory consumption,
because vector field with big dimension will occupy hundreds of times of memory comparing with scalar field. So generally load_collection or load_partition
only load scalar fields' raw data into memory. Vector fields' raw data is loaded into memory only in 3 cases:
If vector's raw data has been loaded into memory, query has already supported to return vector field in output
type VectorFieldInfo struct { mu sync.RWMutex fieldID UniqueID fieldBinlog *datapb.FieldBinlog rowDataInMemory bool rawData map[string]storage.FieldData // map[binlogPath]FieldData } type Segment struct { ... ... vectorFieldInfos map[UniqueID]*VectorFieldInfo } |
// load vector field's data from info.fieldBinlog, save the raw data into info.rawData func (loader *segmentLoader) loadSegmentVectorFieldData(info *VectorFieldInfo) error { |
// For vector output fields, load raw data from fieldBinlog if needed, // get vector raw data via result.Offset from *VectorfieldInfo, then // fill vector raw data into result func (q *queryCollection) fillVectorFieldsData(segment *Segment, result *segcorepb.RetrieveResults) error |
Original vector data storage public interface and struct
Public Interfaces
```go
type FileManager interface {
GetFile(path string) (string, error)
PutFile(path string, content []byte) error
Exist(path string) bool
ReadFile(path string) []byte
}
```
A VectorFileManager implements FileManager interface.
```go
type VectorFileManager struct {
localFileManager FileManager
remoteFileManager FileManager
insertCodec *InsertCodec
}
```
localFileManager is responsible to local file manager. And can be implements with golang os library.
remoteFileManager is responsible for cloud storage or remote server storage, and will be implemented with minio client now.
When the offset of vector is obtained, we can get origin vector data from local vector data file.
Do query / search (with vector field in output fields) in all kinds of combinations of following scenarios, check the correctness of result.