Current state: "Under Discussion"
PRs:
Keywords: delete
Released:
This document describes how to support delete in Milvus. Milvus provides a new delete API to delete entities from a collection.
In some scenarios, users want to delete some entities from a collection that will no longer be searched out. Currently, users can only manually filter out unwanted entities from search results. We hope to implement a new function that allows users to delete entities from a collection.
Delete API can be used to delete entities in the collection, and the deleted entities will no longer appear in the results of the Query and Search request.
`collection_name` is the name of the collection to delete entities from.
`expr` is an expression indicated whether an entity should be deleted in the collection. Only the `in` operator is supported in the Delete API. Document of expression: https://github.com/milvus-io/milvus/blob/master/docs/design_docs/query_boolean_expr.md
`partition_name` is the name of the partition to delete entities from, `None` means all partition.
`Delete` returns after being written into the insert channel, which means the delete request has been reliably saved and will be applied in search/query requests. The type of return value is MutationResult, which contains several properties, and only `_primary_keys` will be filled.
Same as Insert API, Milvus only guarantee the visibility of operations with one client. This means that, within the sequence of the operations "delete(), search()", the result of the search will not contains the entities deleted. Since different clients connect to different Proxy, the time between different Proxy is not exactly the same. So even if you manually call the delete method and the search method on two clients sequentially, it is uncertain whether the search request returns the deleted entities.
Currently, Milvus does not support dedup in inserting, so the delete operation will delete all satisfied entities.
Delete a non-existent entity is not an error, so delete() will not raise an Error.
def delete(self, collection_name, expr, partition_name=None, timeout=None, **kwargs)->MutationResult: """ Delete entities with an expression condition. And return results to show which primary key is deleted successfully :param collection_name: Name of the collection to delete entities from :type collection_name: str :param expr: The query expression :type expr: str :param partition_name: Name of partitions that contain entities :type partition_name: str :param timeout: An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur :type timeout: float :return: delete request executed results. :rtype: MutationResult :raises: RpcError: If gRPC encounter an error ParamError: If parameters are invalid BaseException: If the return result from server is not ok """ |
In Milvus, Proxy maintains 2 Pulsar channels for each collection:
DataNode consumes messages from Insert channel only.
QueryNode consumes messages from both Insert channel and Search channel.
To support delete, we will send DeleteMsg into Insert channel also.
Since Milvus's storage is an append-only, `delete` function is implemented using soft delete, setting a flag on entity to indicate this entity has been deleted.
This solution needs:
Now the algorithm library `Knowhere` has already supported to search with a bitset which indicates whether an entity is deleted. So we discuss how to store the deleted primary keys here.
Unaffected
SegmentFilter provide a method that can used to get the segment id which a PK possible existed in. Implement by segments statistics and bloomfilter.
DeltaLog is the persistent file, recording the primary keys deleted and the delete timestamp. Each DeltaLog only belongs to a segment.
InvertedDeltaLog provides a method that can be used to get deletion that meets the timestamp condition fastly.
Search a deleted entity, except not in the resultset
client.insert() client.search() client.delete() client.search() |