在放射治疗中选择的蒙特卡洛树搜索算法的增强学习应用

论文标题

在放射治疗中选择的蒙特卡洛树搜索算法的增强学习应用

A reinforcement learning application of guided Monte Carlo Tree Search algorithm for beam orientation selection in radiation therapy

论文作者

Sadeghnejad-Barkousaraie, Azar, Bohara, Gyanendra, Jiang, Steve, Nguyen, Dan

论文摘要

由于存在较大的组合问题，当前的放射疗法（例如柱产生（CG））的当前梁方向优化算法通常是启发式或贪婪的，因此导致了次优溶液。我们建议使用蒙特卡洛树搜索进行增强学习策略，该搜索能够找到较高的梁方向集合，并且在较少的时间内，我们使用了涉及监督学习网络的增强型学习结构来指导蒙特卡洛树搜索（GTS）来探索梁方向选择问题的束带决策空间。我们以前已经训练了一个深神经网络（DNN），该网络（DNN）吸收了患者解剖学，器官重量和电流梁，然后近似梁健身值，这表明下一个最佳的光束要添加。该DNN用于概率地指导蒙特卡洛决策树的分支的遍历，以在计划中添加新的光束。为了测试算法的可行性，我们使用13名测试前列腺癌患者解决了5束束计划，与最初对DNN培训的57名培训和验证患者不同。为了将GTS的强度显示为其他搜索方法，还提供了其他三种搜索方法的性能，包括指导搜索，统一的树搜索和随机搜索算法。与CG相比，CG平均在237秒内平均比CG更好地找到了解决方案，并且在不到1000秒的时间内找到具有较低目标函数的解决方案的解决方案。使用我们的引导树搜索（GTS）方法，我们能够在1％的误差范围内维持类似的计划目标量（PTV）覆盖范围，并降低处于危险的器官（OAR）的身体，直肠，左右股骨头的平均剂量，但平均平均剂量略有增加1％。

Due to the large combinatorial problem, current beam orientation optimization algorithms for radiotherapy, such as column generation (CG), are typically heuristic or greedy in nature, leading to suboptimal solutions. We propose a reinforcement learning strategy using Monte Carlo Tree Search capable of finding a superior beam orientation set and in less time than CG.We utilized a reinforcement learning structure involving a supervised learning network to guide Monte Carlo tree search (GTS) to explore the decision space of beam orientation selection problem. We have previously trained a deep neural network (DNN) that takes in the patient anatomy, organ weights, and current beams, and then approximates beam fitness values, indicating the next best beam to add. This DNN is used to probabilistically guide the traversal of the branches of the Monte Carlo decision tree to add a new beam to the plan. To test the feasibility of the algorithm, we solved for 5-beam plans, using 13 test prostate cancer patients, different from the 57 training and validation patients originally trained the DNN. To show the strength of GTS to other search methods, performances of three other search methods including a guided search, uniform tree search and random search algorithms are also provided. On average GTS outperforms all other methods, it find a solution better than CG in 237 seconds on average, compared to CG which takes 360 seconds, and outperforms all other methods in finding a solution with lower objective function value in less than 1000 seconds. Using our guided tree search (GTS) method we were able to maintain a similar planning target volume (PTV) coverage within 1% error, and reduce the organ at risk (OAR) mean dose for body, rectum, left and right femoral heads, but a slight increase of 1% in bladder mean dose.

下载PDF全文

下载文献需遵守相关版权规定

论文标题