论文标题
编码器审稿人重新编码代码生成
Coder Reviewer Reranking for Code Generation
论文作者
论文摘要
从代码语言模型中抽样不同的程序并使用模型可能性重新掌管是代码生成的一种流行方法,但它很容易偏爱退化解决方案。受协作编程的启发,我们建议编码器评论员重新浏览。我们从过去的工作中增强编码器语言模型,该模型通过审阅者模型生成了给定语言指令的程序,这些模型评估了给定程序的指令的可能性。我们在六个数据集中进行了一项广泛的研究,其中八个模型来自三个模型系列。实验结果表明,仅使用编码器模型,编码器评估器的重新疗法可带来一致且显着的改进(绝对准确性增益高达17%的绝对精度增益)。当与可执行性过滤结合使用时,编码器评估器的重新管理通常可以优于最小贝叶斯风险方法。通过提示,可以概括到不同的编程语言,并且可以很好地与现成的超级参数配合,可以易于实现编码器评估器的重新访问。
Sampling diverse programs from a code language model and reranking with model likelihood is a popular method for code generation but it is prone to preferring degenerate solutions. Inspired by collaborative programming, we propose Coder-Reviewer reranking. We augment Coder language models from past work, which generate programs given language instructions, with Reviewer models, which evaluate the likelihood of the instruction given the generated programs. We perform an extensive study across six datasets with eight models from three model families. Experimental results show that Coder-Reviewer reranking leads to consistent and significant improvement (up to 17% absolute accuracy gain) over reranking with the Coder model only. When combined with executability filtering, Coder-Reviewer reranking can often outperform the minimum Bayes risk method. Coder-Reviewer reranking is easy to implement by prompting, can generalize to different programming languages, and works well with off-the-shelf hyperparameters.