Home | Publications

Hong Yi Lin, Patanamon Thongtanunam

International Conference on Software Analysis, Evolution and Reengineering (SANER)

Code review is a crucial ingredient to quality software development, but requires a large amount of time and effort for developers. To optimise this manual process, recent research on automated code review seeks to leverage Neural Machine Translation (NMT) models to perform tasks such as automated code improvement. A recent work had pretrained the NMT model for automated code review in order to equip the model with general coding knowledge. However, their pretraining approach is generic to natural languages, which does not leverage the unique properties of coding languages. Therefore, we set out to explore two state-of-the-art pretrained NMT models (i.e., CodeT5 and GraphCodeBERT) that were designed to learn code structure. We studied the models' abilities to generate correct code improvement through an empirical evaluation based on five different datasets. Our results showed that in terms of generating correct code sequences, CodeT5, GraphCodeBERT, and the prior work achieved an average accuracy of 22%, 18%, and 10%, respectively. In terms of generating correct dataflow structures, they achieved an average accuracy of 33%, 30%, and 22%, respectively. The results suggested that the code structure focused approaches could outperform the generic pretraining approach. This work contributes towards enhancing automated code review techniques by understanding the effectiveness of code structure focused NMT models.