CUDA task graph can save snapshot of task invocations and replay later. It has way faster relaunching time compare to launching via stream every time.

Reference