Current state: Under Discussion
ISSUE: #6493 Query Nodes that only load segments will get stuck when searching
Keywords: QueryNode / Search / Retrieve
Currently we have only one service time which would block query(search&retrieve) on multiple channels and historical. While historical doesn't need to wait for new tSafe, and current channel doesn't need to wait for tSafe from other channels. So we need to refactor query(search&retreive) workflow and decouple the dependencies between historical and channels.
This proposal is about
- The details on how to refactor query workflow.
- Test plan of new query workflow.
- Improve query performance:: Decoupling the query dependencies between historical and channels, query would be more efficient.
- Fix issue that query would get stuck: A bug fix on issue #6493.
Make query workflow into multiple stages, they are InputStage, LoadBalanceStage, RequsetHandlerStage, HistoricalStage, VChannelStage (VChannelStage have one or more, depending on the number of VChannel in the current Collection), UnsolvedStage and ResultHandlerStage。
Stages are connected to each other by Golang's channel. Each stage will run one or more goroutines in the background to excute the query task. The query workflow design is shown in the following figure:
Stages are independent of each other, that is, after the current query task is done, the next query task can be excuted immediately, without taking care of the status of other stages. The details of these stages are as follows:
- InputStage: Consume from queryChannel of msgStream, and group messages by the message type (query, loadBalance) and send them to loadBalanceStage or RequestHandlerStage.
- loadBalanceStage: Excute loadBalance task.
- RequestHandlerStage: Query task preprocessing (such as parsing DSL, creating query plan, etc) and send it to HistoricalStage and VChannelStage.
- HistoricalStage: Query in Historical, get the query result and send it to ResultHandlerStage.
- VChannelStage: Check the query time, when the service time doesn't get to the query time, the current task would be sent to UnsolvedStage, then return and excute the next query task immediately; when the service time has been get to the query time, excute the query task immediately and the query result is sent to ResultHandlerStage.
- UnsolvedStage: A for-loop triggered by the update of TSafe that will continuously check the query task time, and execute the tasks which service time has been get to the query time, and send the result to ResultHandlerStage.
- ResultHandlerStage: Reduce query results and produce the results to query result channel of msgStream.
Compatibility, Deprecation, and Migration Plan
- Unit tests about all stages and unit test of simple query on queryCollection.
- Test of query on the QueryNode which only have historical data.
- All ci test cases about query.
Learn more about SEDA https://en.wikipedia.org/wiki/Staged_event-driven_architecture
An Interesting topic is how you decide how many go-routines you use in each stage, like https://img-my.csdn.net/uploads/201211/08/1352369109_7636.png.
Another challenge is how to control memory consumption. Let's say one of the Historical stage is larger than others, it is gonna to be searched slower than other stages. Then you have to back pressure to Request handler stage in order to limit memory usage in your go channels at result handler stage. Otherwise result handler stage input channel will be filled by vchannel stage result.
Can we send the vchannel's search result to result channel immediately once the vchannel's search process is done. After all, the number of vchannels and the number of growing segments are not very large.
By the way, I love the design, it gives us great flexibility to add a preprocessing and postprocessing.
This design also makes it easier to handle watch channel and remove channel tasks.