Yang Hong, Chakkrit Tantithamthavorn, Patanamon Thongtanunam
The International Conference on Software Analysis, Evolution and Reengineering (SANER)
Code review is an effective quality assurance practice, yet can be time-consuming since reviewers have to carefully review all new added lines in a patch. Our analysis shows that at the median, patch authors often waited 15-64 hours to receive initial feedback from reviewers, which accounts for 16%-26% of the whole review time of a patch. Importantly, we also found that large patches tend to receive initial feedback from reviewers slower than smaller patches. Hence, it would be beneficial to reviewers to reduce their effort with an approach to pinpoint the lines that they should pay attention to. In this paper, we proposed REVSPOT - a machine learning-based approach to predict problematic lines (i.e., lines that will receive a comment and lines that will be revised). Through a case study of three open-source projects (i.e., Openstack Nova, Openstack Ironic, and Qt Base), REVSPOT can accurately predict lines that will receive comments and will be revised (with a Top-10 Accuracy of 81% and 93%, which is 56% and 15% better than the baseline approach), and these correctly predicted problematic lines are related to logic defects, which could impact the functionality of the system. Based on these findings, our REVSPOT could help reviewers to reduce their reviewing effort by reviewing a smaller set of lines and increasing code review speed and reviewers' productivity.