Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

However, there are several scenarios issues to consider be considered in the implementation:

  1. What if some query node associated with the handoff or load balance process is suddenly down
  2. If there is a sudden downtime on query coord, how to properly resume the handoff and load balance tasks after the restart
  3. If the same sealed segment exists on different query nodes at the same time, whether it will have an impact on the final query results
  4. Different query nodes may not process query messages at the same speed, and how to ensure that different query nodes have the same global sealed segmentIDs when processing the same query message. If not handled well, it may cause the query to time out

...

Prerequisite: The cache meta of querycoord records which sealed segments and growing segments are loaded on each query node, and which dmchannels are watched by the query node. These metas are stored synchronously in etcd at the same time.

1.First of all, querycoord automatically generates handoff tasks and load balance tasks and writes them into etcd, and then clears them from etcd until the task is completed, ensuring that querycoord can accurately restore the task after restarting.

...

4.After query coord successfully loads sealed segments on querynode, update the sealed segments list of each querynode in cache meta. For example, after balancing sealed segment 8 from node 1 to node 2, the meta of node1 changes from {S6, S7,S8} to { S6, S7}, node2's meta is changed from {S5} to {S5, S8}. If the query node suddenly goes down, directly follow the meta records of the coord to recover the query node. While querycoord updates the meta, it sends a change info of sealed segments to querychannel. The proto of sealed segments change info is as follows:

...