publications | Patanamon (Pick) Thongtanunam

2025

TOSEM
Leveraging Reviewer Experience in Code Review Comment Generation

Hong Yi Lin, Patanamon Thongtanunam, Christoph Treude, Michael W Godfrey, Chunhua Liu, and 1 more author

ACM Transactions on Software Engineering and Methodology, 2025

Abs Bib HTML PDF

Modern code review is a ubiquitous software quality assurance process aimed at identifying potential issues within newly written code. Despite its effectiveness, the process demands large amounts of effort from the human reviewers involved. To help alleviate this workload, researchers have trained deep learning models to imitate human reviewers in providing natural language code reviews. Formally, this task is known as code review comment generation. Prior work has demonstrated improvements in this task by leveraging machine learning techniques and neural models, such as transfer learning and the transformer architecture. However, the quality of the model generated reviews remain sub-optimal due to the quality of the open-source code review data used in model training. This is in part due to the data obtained from open-source projects where code reviews are conducted in a public forum, and reviewers possess varying levels of software development experience, potentially affecting the quality of their feedback. To accommodate for this variation, we propose a suite of experience-aware training methods that utilise the reviewers’ past authoring and reviewing experiences as signals for review quality. Specifically, we propose experience-aware loss functions (ELF), which use the reviewers’ authoring and reviewing ownership of a project as weights in the model’s loss function. Through this method, experienced reviewers’ code reviews yield larger influence over the model’s behaviour. Compared to the SOTA model, ELF was able to generate higher quality reviews in terms of accuracy, informativeness, and comment types generated. The key contribution of this work is the demonstration of how traditional software engineering concepts such as reviewer experience can be integrated into the design of AI-based automated code review models.
@article{LinTOSEM2025, title = {Leveraging Reviewer Experience in Code Review Comment Generation}, author = {Lin, Hong Yi and Thongtanunam, Patanamon and Treude, Christoph and Godfrey, Michael W and Liu, Chunhua and Charoenwet, Wachiraphan}, year = {2025}, journal = {ACM Transactions on Software Engineering and Methodology}, pages = {to appear}, doi = {}, }

ICSME

From Release to Adoption: Challenges in Reusing Pre-trained AI Models for Downstream Developers

Peerachai Banyongrakkul, Mansooreh Zahedi, Patanamon Thongtanunam, Christoph Treude, and Haoyu Gao

In Proceedings of the IEEE International Conference on Software Maintenance and Evolution, 2025

Acceptance rate: 30% (45/146)

@inproceedings{banyongrakkulICSME2025,
  title = {From Release to Adoption: Challenges in Reusing Pre-trained AI Models for Downstream Developers},
  author = {Banyongrakkul, Peerachai and Zahedi, Mansooreh and Thongtanunam, Patanamon and Treude, Christoph and Gao, Haoyu},
  booktitle = {Proceedings of the IEEE International Conference on Software Maintenance and Evolution},
  pages = {to appear},
  doi = {},
  note = {Acceptance rate: 30% (45/146)},
  year = {2025}
}

SCAM

Exploring the Potential of Large Language Models in Fine-Grained Review Comment Classification

Linh Nguyen, Chunhua Liu, Hong Yi Lin, and Patanamon Thongtanunam

In Proceedings of tIEEE International Conference on Source Code Analysis & Manipulation, 2025

Bib HTML

@inproceedings{banyongrakkulICSME2026,
  title = {Exploring the Potential of Large Language Models
  in Fine-Grained Review Comment Classification},
  author = {Nguyen, Linh and Liu, Chunhua and Lin, Hong Yi and Thongtanunam, Patanamon},
  booktitle = {Proceedings of tIEEE International Conference on Source Code Analysis & Manipulation},
  pages = {to appear},
  doi = {},
  note = {},
  year = {2025}
}

ACL
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models

Hong Yi Lin, Chunhua Liu, Haoyu Gao, Patanamon Thongtanunam, and Christoph Treude

In Findings of the Association for Computational Linguistics: ACL 2025, 2025

Abs Bib

State-of-the-art large language models (LLMs) have demonstrated impressive code generation capabilities but struggle with real-world software engineering tasks, such as revising source code to address code reviews, hindering their practical use. Code review comments are often implicit, ambiguous, and colloquial, requiring models to grasp both code and human intent. This challenge calls for evaluating large language models’ ability to bridge both technical and conversational contexts. While existing work has employed the automated code refinement (ACR) task to resolve these comments, current evaluation methods fall short, relying on text matching metrics that provide limited insight into model failures and remain susceptible to training data contamination.To address these limitations, we introduce a novel evaluation benchmark, \textbfCodeReviewQA that enables us to conduct fine-grained assessment of model capabilities and mitigate data contamination risks.In CodeReviewQA, we decompose the generation task of code refinement into \textbfthree essential reasoning steps: \textitchange type recognition (CTR), \textitchange localisation (CL), and \textitsolution identification (SI). Each step is reformulated as multiple-choice questions with varied difficulty levels, enabling precise assessment of model capabilities, while mitigating data contamination risks. Our comprehensive evaluation spans 72 recently released large language models on \textbf900 manually curated, high-quality examples across nine programming languages. Our results show that CodeReviewQA is able to expose specific model weaknesses in code review comprehension, disentangled from their generative automated code refinement results.
@inproceedings{linACL2025, title = {{C}ode{R}eview{QA}: The Code Review Comprehension Assessment for Large Language Models}, author = {Lin, Hong Yi and Liu, Chunhua and Gao, Haoyu and Thongtanunam, Patanamon and Treude, Christoph}, booktitle = {Findings of the Association for Computational Linguistics: ACL 2025}, year = {2025}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.findings-acl.476/}, doi = {10.18653/v1/2025.findings-acl.476}, pages = {9138--9166}, isbn = {979-8-89176-256-5} }
ICSE
Human-In-the-Loop Software Development Agents

Wannita Takerngsaksiri, Jirat Pasuksmit, Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Ruixiong Zhang, and 5 more authors

In Proceedings of the ACM/IEEE International Conference on Software Engineering, 2025

Abs Bib HTML PDF

Recently, Large Language Models (LLMs)-based multi-agent paradigms for software engineering are introduced to automatically resolve software development tasks (e.g., from a given issue to source code). However, existing work is evaluated based on historical benchmark datasets, does not consider human feedback at each stage of the automated software development process, and has not been deployed in practice. In this paper, we introduce a Human-in-the-loop LLM-based Agents framework (HULA) for software development that allows software engineers to refine and guide LLMs when generating coding plans and source code for a given task. We design, implement, and deploy the HULA framework into Atlassian JIRA for internal uses. Through a multi-stage evaluation of the HULA framework, Atlassian software engineers perceive that HULA can minimize the overall development time and effort, especially in initiating a coding plan and writing code for straightforward tasks. On the other hand, challenges around code quality are raised to be solved in some cases. We draw lessons learned and discuss opportunities for future work, which will pave the way for the advancement of LLM-based agents in software development.
@inproceedings{TakerngsaksiriICSE2025, title = {Human-In-the-Loop Software Development Agents}, author = {Takerngsaksiri, Wannita and Pasuksmit, Jirat and Thongtanunam, Patanamon and Tantithamthavorn, Chakkrit and Zhang, Ruixiong and Jiang, Fan and Li, Jing and Cook, Evan and Chen, Kun and Wu, Ming}, year = {2025}, booktitle = {Proceedings of the ACM/IEEE International Conference on Software Engineering}, pages = {to appear}, doi = {}, note = {}, }
MSR
Too Noisy To Learn: Enhancing Data Quality for Code Review Comment Generation

Chunhua Liu, Hong Yi Lin, and Patanamon Thongtanunam

In Proceedings of the ACM/IEEE International Conference on Mining software repositories, 2025

Acceptance rate: 29% (44/161)

Abs Bib HTML PDF

Code review is an important practice in software development, yet it is time-consuming and requires substantial effort. While open-source datasets have been used to train neural models for automating code review tasks, including review comment generation, these datasets contain a significant amount of noisy comments (e.g., vague or non-actionable feedback) that persist despite cleaning methods using heuristics and machine learning approaches. Such remaining noise may lead models to generate low-quality review comments, yet removing them requires a complex semantic understanding of both code changes and natural language comments. In this paper, we investigate the impact of such noise on review comment generation and propose a novel approach using large language models (LLMs) to further clean these datasets. Based on an empirical study on a large-scale code review dataset, our LLM-based approach achieves 66-85% precision in detecting valid comments. Using the predicted valid comments to fine-tune the state-of-the-art code review models (cleaned models) can generate review comments that are 13.0% - 12.4% more similar to valid human-written comments than the original models. We also find that the cleaned models can generate more informative and relevant comments than the original models. Our findings underscore the critical impact of dataset quality on the performance of review comment generation. We advocate for further research into cleaning training data to enhance the practical utility and quality of automated code review.
@inproceedings{LiuMSR2025, title = {Too Noisy To Learn: Enhancing Data Quality for Code Review Comment Generation}, author = {Liu, Chunhua and Lin, Hong Yi and Thongtanunam, Patanamon}, year = {2025}, booktitle = {Proceedings of the ACM/IEEE International Conference on Mining software repositories}, pages = {to appear}, doi = {}, note = {Acceptance rate: 29% (44/161)}, }
MSR
Human-In-The-Loop Software Development Agents: Challenges and Future Directions

Jirat Pasuksmit, Wannita Takerngsaksiri, Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Ruixiong Zhang, and 5 more authors

In Proceedings of the ACM/IEEE International Conference on Mining software repositories, 2025

Abs Bib HTML PDF

Multi-agent LLM-driven systems for software development are rapidly gaining traction, offering new opportunities to enhance productivity. At Atlassian, we deployed Human-in-the-Loop Software Development Agents to resolve Jira work items and evaluated the generated code quality using functional correctness testing and GPT-based similarity scoring. This paper highlights two major challenges: the high computational costs of unit testing and the variability in LLM-based evaluations. We also propose future research directions to improve evaluation frameworks for Human-In-The-Loop software development tools.
@inproceedings{PasuksmitMSR2025, title = {Human-In-The-Loop Software Development Agents: Challenges and Future Directions}, author = {Pasuksmit, Jirat and Takerngsaksiri, Wannita and Thongtanunam, Patanamon and Tantithamthavorn, Chakkrit and Zhang, Ruixiong and Jiang, Fan and Li, Jing and Cook, Evan and Chen, Kun and Wu, Ming}, year = {2025}, booktitle = {Proceedings of the ACM/IEEE International Conference on Mining software repositories}, pages = {to appear}, doi = {}, note = {}, }
MSR
Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks

Kyi Shin Khant, Hong Yi Lin, and Patanamon Thongtanunam

In Proceedings of the ACM/IEEE International Conference on Mining software repositories, 2025

Acceptance rate: 29% (44/161)

Abs Bib HTML PDF

Learning-based techniques, especially advanced pre-trained models for code have demonstrated capabilities in code understanding and generation, solving diverse software engineering (SE) tasks. Despite the promising results, current training approaches may not fully optimize model performance, as they typically involve learning from randomly shuffled training data. Recent work shows that Curriculum Learning (CL) can improve performance on code-related tasks through incremental learning based on the difficulty of synthetic code. Yet, the effectiveness of CL with conventional difficulty measures in SE tasks remains largely unexplored. In this study, we explore two conventional code metrics: code length and cyclomatic complexity to determine the difficulty levels. We investigate how the pre-trained code model (CodeT5) learns under CL, through the tasks of code clone detection and code summarization. Our empirical study on the CodeXGLUE benchmark showed contrasting results to prior studies, where the model exhibited signs of catastrophic forgetting and shortcut learning. Surprisingly, model performance saturates after only the first quartile of training, potentially indicating a limit in the model’s representation capacity and/or the task’s inherent difficulty. Future work should further explore various CL strategies with different code models across a wider range of SE tasks for a more holistic understanding.
@inproceedings{KhantMSR2025, title = {Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks}, author = {Khant, Kyi Shin and Lin, Hong Yi and Thongtanunam, Patanamon}, year = {2025}, booktitle = {Proceedings of the ACM/IEEE International Conference on Mining software repositories}, pages = {to appear}, doi = {}, note = {Acceptance rate: 29% (44/161)}, }

2024

ISSRE
Code Ownership: The Principles, Differences, and Their Associations with Software Quality

Patanamon Thongtanunam, and Chakkrit Tantithamthavorn

In Proceedings of the IEEE International Symposium on Software Reliability Engineering, 2024

Acceptance rate: 20% (42/206)

Abs Bib HTML PDF

Code ownership-an approximation of the degree of ownership of a software component-is one of the important software measures used in quality improvement plans. However, prior studies proposed different variants of code ownership approximations. Yet, little is known about the difference in code ownership approximations and their association with software quality. In this paper, we investigate the differences in the commonly used ownership approximations (i.e., commit-based and line-based) in terms of the set of developers, the approximated code ownership values, and the expertise level. Then, we analyze the association of each code ownership approximation with the defect-proneness. Through an empirical study of 25 releases that span real-world open-source software systems, we find that commit-based and line-based ownership approximations produce different sets of developers, different code ownership values, and different sets of major developers. In addition, we find that the commit-based approximation has a stronger association with software quality than the line-based approximation. Based on our analysis, we recommend line-based code ownership be used for accountability purposes (e.g., authorship attribution, intellectual property), while commit-based code ownership should be used for rapid bug-fixing and charting quality improvement plans.
@inproceedings{ThongtanunamISSRE2024, title = {Code Ownership: The Principles, Differences, and Their Associations with Software Quality}, author = {Thongtanunam, Patanamon and Tantithamthavorn, Chakkrit}, year = {2024}, booktitle = {Proceedings of the IEEE International Symposium on Software Reliability Engineering}, pages = {to appear}, doi = {}, note = {Acceptance rate: 20% (42/206)}, }
IST
Don’t forget to change these functions! recommending co-changed functions in modern code review

Yang Hong, Chakkrit Tantithamthavorn, Patanamon Thongtanunam, and Aldeida Aleti

Information and Software Technology, 2024

Abs Bib HTML PDF

Context: Code review is effective and widely used, yet still time-consuming. Especially, in large-scale software systems, developers may forget to change other related functions that must be changed together (aka. co-changes). This may increase the number of review iterations and reviewing time, thus delaying the code review process. Based on our analysis of 66 projects from five open-source systems, we find that there are 16%–33% of code reviews where at least one function must be co-changed, but was not initially changed. Objectives: This study aims to propose an approach to recommend co-changed functions in the context of modern code review, which could reduce reviewing time and iterations and help developers identify functions that need to be changed together. Methods: We propose CoChangeFinder, a novel method that employs a Graph Neural Network (GNN) to recommend co-changed functions for newly submitted code changes. Then, we conduct a quantitative and qualitative evaluation of CoChangeFinder with 66 studied large-scale open-source software projects. Results: Our evaluation results show that our CoChangeFinder outperforms the state-of-the-art approach, achieving 3.44% to 40.45% for top-k accuracy, 2.00% to 26.07% for Recall@k, and 0.04 to 0.21 for mean average precision better than the baseline approach. In addition, our CoChangeFinder demonstrates the capacity to pinpoint the functions related to logic changes. Conclusion: Our CoChangeFinder outperforms the baseline approach (i.e., TARMAQ) in recommending co-changed functions during the code review process. Based on our findings, CoChangeFinder could help developers save their time and effort, reduce review iterations, and enhance the efficiency of the code review process.
@article{Hong2024IST, author = {Hong, Yang and Tantithamthavorn, Chakkrit and Thongtanunam, Patanamon and Aleti, Aldeida}, year = {2024}, journal = {Information and Software Technology}, pages = {to appear}, doi = {10.1016/j.infsof.2024.107547}, }
ISSTA
An Empirical Study of Static Analysis Tools for Secure Code Review

Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, and Christoph Treude

In Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2024

Acceptance rate: 20% (143/694)

Abs Bib HTML PDF

Early identification of security issues in software development is vital to minimize their unanticipated impacts. Code review is a widely used manual analysis method that aims to uncover security issues along with other coding issues in software projects. While some studies suggest that automated static application security testing tools (SASTs) could enhance security issue identification, there is limited understanding of SAST’s practical effectiveness in supporting secure code review. Moreover, most SAST studies rely on synthetic or fully vulnerable versions of the subject program, which may not accurately represent real-world code changes in the code review process. To address this gap, we study C/C++ SASTs using a dataset of actual code changes that contributed to exploitable vulnerabilities. Beyond SAST’s effectiveness, we quantify potential benefits when changed functions are prioritized by SAST warnings. Our dataset comprises 319 real-world vulnerabilities from 815 vulnerability-contributing commits (VCCs) in 92 C and C++ projects. The result reveals that a single SAST can produce warnings in vulnerable functions of 52% of VCCs. Prioritizing changed functions with SAST warnings can improve accuracy (i.e., 12% of precision and 5.6% of recall) and reduce Initial False Alarm (lines of code in non-vulnerable functions inspected until the first vulnerable function) by 13%. Nevertheless, at least 76% of the warnings in vulnerable functions are irrelevant to the VCCs, and 22% of VCCs remain undetected due to limitations of SAST rules. Our findings highlight the benefits and the remaining gaps of SAST-supported secure code reviews and challenges that should be addressed in future work.
@inproceedings{CharoenwetSAT2024, title = {An Empirical Study of Static Analysis Tools for Secure Code Review}, author = {Charoenwet, Wachiraphan and Thongtanunam, Patanamon and Pham, Van-Thuan and Treude, Christoph}, year = {2024}, booktitle = {Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)}, pages = {to appear}, doi = {}, note = {Acceptance rate: 20% (143/694)}, }
ISSTA
VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction

Thanh-Dat Nguyen, Tung Do-Viet, Hung Nguyen-Duy, Tuan-Hai Luu, Hung Le, and 2 more authors

In Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA), 2024

Acceptance rate: 20% (143/694)

Abs Bib HTML PDF

Businesses need to query visually rich documents (VRDs) like receipts, medical records, and insurance forms to make decisions. Existing techniques for extracting entities from VRDs struggle with new layouts or require extensive pre-training data. We introduce VRDSynth, a program synthesis method to automatically extract entity relations from multilingual VRDs without pre-training data. To capture the complexity of VRD domain, we design a domain-specific language (DSL) to capture spatial and textual relations to describe the synthesized programs. Along with this, we also derive a new synthesis algorithm utilizing frequent spatial relations, search space pruning, and a combination of positive, negative, and exclusive programs to improve coverage. We evaluate VRDSynth on the FUNSD and XFUND benchmarks for semantic entity linking, consisting of 1,592 forms in 8 languages. VRDSynth outperforms state-of-the-art pre-trained models (LayoutXLM, InfoXLMBase, and XLMRobertaBase) in 5, 6, and 7 out of 8 languages, respectively, improving the F1 score by 42% over LayoutXLM in English. To test the extensibility of the model, we further improve VRDSynth with automated table recognition, creating VRDSynth(Table), and compare it with extended versions of the pre-trained models, InfoXLM(Large) and XLMRoberta(Large). VRDSynth(Table) outperforms these baselines in 4 out of 8 languages and in average F1 score. VRDSynth also significantly reduces memory footprint (1M and 380MB vs. 1.48GB and 3GB for LayoutXLM) while maintaining similar time efficiency.
@inproceedings{Nguyen2024, title = {VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction}, author = {Nguyen, Thanh-Dat and Do-Viet, Tung and Nguyen-Duy, Hung and Luu, Tuan-Hai and Le, Hung and Le, Bach and Thongtanunam, Patanamon}, year = {2024}, booktitle = {Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA)}, pages = {to appear}, doi = {}, note = {Acceptance rate: 20% (143/694)}, }
TOSEM
Automatically Recommend Code Updates: Are We There Yet?

Yue Liu, Chakkrit Tantithamthavorn, Yonghui Liu, Patanamon Thongtanunam, and Li Li

ACM Transactions on Software Engineering and Methodology, 2024

Abs Bib HTML PDF

In recent years, large pre-trained Language Models of Code (CodeLMs) have shown promising results on various software engineering tasks. One such task is automatic code update recommendation, which transforms outdated code snippets into their approved and revised counterparts. Although many CodeLM-based approaches have been proposed, claiming high accuracy, their effectiveness and reliability on real-world code update tasks remain questionable. In this paper, we present the first extensive evaluation of state-of-the-art CodeLMs for automatically recommending code updates. We assess their performance on two diverse datasets of paired updated methods, considering factors such as temporal evolution, project specificity, method size, and update complexity. Our results reveal that while CodeLMs exhibit higher performance in settings that ignore temporal information, they struggle in more realistic time-wise scenarios and generalize poorly to new projects. Furthermore, CodeLM performance decreases significantly for larger methods and more complex updates. Furthermore, we observe that many CodeLM-generated “updates” are actually null, especially in time-wise settings, and meaningful edits remain challenging. Our findings highlight the significant gap between the perceived and actual effectiveness of CodeLMs for real-world code update recommendation and emphasize the need for more research on improving their practicality, robustness, and generalizability.
@article{Lyu2024, title = {Automatically Recommend Code Updates: Are We There Yet?}, author = {Liu, Yue and Tantithamthavorn, Chakkrit and Liu, Yonghui and Thongtanunam, Patanamon and Li, Li}, year = {2024}, journal = {ACM Transactions on Software Engineering and Methodology}, pages = {to appear}, doi = {}, }
EMSE
Toward Effective Secure Code Reviews: An Empirical Study of Security-Related Coding Weaknesses

Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, and Christoph Treude

Springer Journal of Empirical Software Engineering, 2024

Abs Bib HTML PDF

Identifying security issues early is encouraged to reduce the latent negative impacts on software systems. Code review is a widely-used method that allows developers to manually inspect modified code, catching security issues during a software development cycle. However, existing code review studies often focus on known vulnerabilities, neglecting coding weaknesses, which can introduce real-world security issues that are more visible through code review. The practices of code reviews in identifying such coding weaknesses are not yet fully investigated. To better understand this, we conducted an empirical case study in two large open-source projects, OpenSSL and PHP. Based on 135,560 code review comments, we found that reviewers raised security concerns in 35 out of 40 coding weakness categories. Surprisingly, some coding weaknesses related to past vulnerabilities, such as memory errors and resource management, were discussed less often than the vulnerabilities. Developers attempted to address raised security concerns in many cases (39%-41%), but a substantial portion was merely acknowledged (30%-36%), and some went unfixed due to disagreements about solutions (18%-20%). This highlights that coding weaknesses can slip through code review even when identified. Our findings suggest that reviewers can identify various coding weaknesses leading to security issues during code reviews. However, these results also reveal shortcomings in current code review practices, indicating the need for more effective mechanisms or support for increasing awareness of security issue management in code reviews.
@article{Charoenwet2024, title = {Toward Effective Secure Code Reviews: An Empirical Study of Security-Related Coding Weaknesses}, author = {Charoenwet, Wachiraphan and Thongtanunam, Patanamon and Pham, Van-Thuan and Treude, Christoph}, year = {2024}, journal = {Springer Journal of Empirical Software Engineering}, pages = {to appear}, doi = {}, }
TOSEM
Automatic Programming: Large Language Models and Beyond

Michael R Lyu, Baishakhi Ray, Abhik Roychoudhury, Shin Hwei Tan, and Patanamon Thongtanunam

ACM Transactions on Software Engineering and Methodology, 2024

Abs Bib HTML PDF

Automatic programming has seen increasing popularity due to the emergence of tools like GitHub Copilot which rely on Large Language Models (LLMs). At the same time, automatically generated code faces challenges during deployment due to concerns around quality and trust. In this article, we study automated coding in a general sense and study the concerns around code quality, security and related issues of programmer responsibility. These are key issues for organizations while deciding on the usage of automatically generated code. We discuss how advances in software engineering such as program repair and analysis can enable automatic programming. We conclude with a forward looking view, focusing on the programming environment of the near future, where programmers may need to switch to different roles to fully utilize the power of automatic programming. Automated repair of automatically generated programs from LLMs, can help produce higher assurance code from LLMs, along with evidence of assurance
@article{Lyu2025, title = {Automatic Programming: Large Language Models and Beyond}, author = {Lyu, Michael R and Ray, Baishakhi and Roychoudhury, Abhik and Tan, Shin Hwei and Thongtanunam, Patanamon}, year = {2024}, journal = {ACM Transactions on Software Engineering and Methodology}, pages = {to appear}, doi = {}, }
Comp. Survey
A Systematic Literature Review on Reasons and Approaches for Accurate Effort Estimations in Agile

Jirat Pasuksmit, Patanamon Thongtanunam, and Shanika Karunasekera

ACM Computing Surveys, 2024

Abs Bib HTML PDF

Background: Accurate effort estimation is crucial for planning in Agile iterative development. Agile estimation generally relies on consensus-based methods like planning poker, which require less time and information than other formal methods (e.g., COSMIC) but are prone to inaccuracies. Understanding the common reasons for inaccurate estimations and how proposed approaches can assist practitioners is essential. However, prior systematic literature reviews (SLR) only focus on the estimation practices (e.g., [26, 127]) and the effort estimation approaches (e.g., [6]). Aim: We aim to identify themes of reasons for inaccurate estimations and classify approaches to improve effort estimation. Method: We conducted an SLR and identified the key themes and a taxonomy. Results: The reasons for inaccurate estimation are related to information quality, team, estimation practice, project management, and business influences. The effort estimation approaches were the most investigated in the literature, while only a few aim to support the effort estimation process. Yet, few automated approaches are at risk of data leakage and indirect validation scenarios. Recommendations: Practitioners should enhance the quality of information for effort estimation, potentially by adopting an automated approach. Future research should aim to improve the information quality, while avoiding data leakage and indirect validation scenarios.
@article{Pasuksmit2024, title = {A Systematic Literature Review on Reasons and Approaches for Accurate Effort Estimations in Agile}, author = {Pasuksmit, Jirat and Thongtanunam, Patanamon and Karunasekera, Shanika}, year = {2024}, journal = {ACM Computing Surveys}, pages = {to appear}, doi = {}, }
FSE
Practitioners’ Challenges and Perceptions of CI Build Failure Predictions at Atlassian

Yang Hong, Chakkrit Tantithamthavorn, Jirat Pasuksmit, Patanamon Thongtanunam, Arik Friedman, and 2 more authors

In Proceedings of the ACM International Conference on the Foundations of Software Engineering (FSE), 2024

Abs Bib HTML PDF

Continuous Integration (CI) build failures could significantly impact the software development process and teams, such as delaying the release of new features and reducing developers’ productivity. In this work, we report on an empirical study that investigates CI build failures throughout product development at Atlassian. Our quantitative analysis found that the repository dimension is the key factor influencing CI build failures. In addition, our qualitative survey revealed that Atlassian developers perceive CI build failures as challenging issues in practice. Furthermore, we found that the CI build prediction can not only provide proactive insight into CI build failures but also facilitate the team’s decision-making. Our study sheds light on the challenges and expectations involved in integrating CI build prediction tools into the Bitbucket environment, providing valuable insights for enhancing CI processes.
@inproceedings{Hong2024, title = {Practitioners' Challenges and Perceptions of CI Build Failure Predictions at Atlassian}, author = {Hong, Yang and Tantithamthavorn, Chakkrit and Pasuksmit, Jirat and Thongtanunam, Patanamon and Friedman, Arik and Zhao, Xing and Krasikov, Anton}, year = {2024}, booktitle = {Proceedings of the ACM International Conference on the Foundations of Software Engineering (FSE)}, pages = {to appear}, doi = {}, }
MSR
Curated Email-Based Code Reviews Datasets

Mingzhao Liang, Wachiraphan Charoenwet, and Patanamon Thongtanunam

In Proceedings of the IEEE/ACM International Conference on Mining Software Repositories, 2024

Abs Bib HTML PDF

Code review is an important practice that improves the overall quality of a proposed patch (i.e. code changes). While much research focused on tool-based code reviews (e.g. a Gerrit code review tool, GitHub), many traditional open-source software (OSS) projects still conduct code reviews through emails. However, due to the nature of unstructured email-based data, it can be challenging to mine email-based code reviews, hindering researchers from delving into the code review practice of such long-standing OSS projects. Therefore , this paper presents large-scale datasets of email-based code reviews of 167 projects across three OSS communities (i.e. Linux Kernel, OzLabs, and FFmpeg). We mined the data from Patchwork, a web-based patch-tracking system for email-based code review, and curated the data by grouping a submitted patch and its revised versions and grouping email aliases. Our datasets include a total of 4.2M patches with 2.1M patch groups and 169K email addresses belonging to 141K individuals. Our published artefacts include the datasets as well as a tool suite to crawl, curate, and store Patchwork data. With our datasets, future work can directly delve into an email-based code review practice of large OSS projects without additional effort in data collection and curation.
@inproceedings{Liang2024, title = {Curated Email-Based Code Reviews Datasets}, author = {Liang, Mingzhao and Charoenwet, Wachiraphan and Thongtanunam, Patanamon}, year = {2024}, booktitle = {Proceedings of the IEEE/ACM International Conference on Mining Software Repositories}, pages = {to appear}, doi = {}, }
MSR
Improving Automated Code Reviews: Learning from Experience

Hong Yi Lin, Patanamon Thongtanunam, Christoph Treude, and Wachiraphan Charoenwet

In Proceedings of the IEEE/ACM International Conference on Mining Software Repositories, 2024

Abs Bib HTML PDF

Modern code review is a critical quality assurance process that is widely adopted in both industry and open source software environments. This process can help newcomers learn from the feedback of experienced reviewers; however, it often brings a large workload and stress to reviewers. To alleviate this burden, the field of automated code reviews aims to automate the process, teaching large language models to provide reviews on submitted code, just as a human would. A recent approach pre-trained and fine-tuned the code intelligent language model on a large-scale code review corpus. However, such techniques did not fully utilise quality reviews amongst the training data. Indeed, reviewers with a higher level of experience or familiarity with the code will likely provide deeper insights than the others. In this study, we set out to investigate whether higher-quality reviews can be generated from automated code review models that are trained based on an experience-aware oversampling technique. Through our quantitative and qualitative evaluation, we find that experience-aware oversampling can increase the correctness, level of information, and meaningfulness of reviews generated by the current state-of-the-art model without introducing new data. The results suggest that a vast amount of high-quality reviews are underutilised with current training strategies. This work sheds light on resource-efficient ways to boost automated code review models.
@inproceedings{Lin2024, title = {Improving Automated Code Reviews: Learning from Experience}, author = {Lin, Hong Yi and Thongtanunam, Patanamon and Treude, Christoph and Charoenwet, Wachiraphan}, year = {2024}, booktitle = {Proceedings of the IEEE/ACM International Conference on Mining Software Repositories}, pages = {to appear}, doi = {}, }
MSR
Encoding Version History Context for Better Code Representation

Huy Nguyen, Christoph Treude, and Patanamon Thongtanunam

In Proceedings of the IEEE/ACM International Conference on Mining Software Repositories, 2024

Abs Bib HTML PDF

With the exponential growth of AI tools that generate source code, understanding software has become crucial. When developers comprehend a program, they may refer to additional contexts to look for information, e.g. program documentation or historical code versions. Therefore, we argue that encoding this additional contextual information could also benefit code representation for deep learning. Recent papers incorporate contextual data (e.g. call hierarchy) into vector representation to address program comprehension problems. This motivates further studies to explore additional contexts, such as version history, to enhance models’ understanding of programs. That is, insights from version history enable recognition of patterns in code evolution over time, recurring issues, and the effectiveness of past solutions. Our paper presents preliminary evidence of the potential benefit of encoding contextual information from the version history to predict code clones and perform code classification. We experiment with two representative deep learning models, ASTNN and CodeBERT, to investigate whether combining additional contexts with different aggregations may benefit downstream activities. The experimental result affirms the positive impact of combining version history into source code representation in all scenarios; however, to ensure the technique performs consistently, we need to conduct a holistic investigation on a larger code base using different combinations of contexts, aggregation, and models. Therefore, we propose a research agenda aimed at exploring various aspects of encoding additional context to improve code representation and its optimal utilisation in specific situations.
@inproceedings{Nguyen2025, title = {Encoding Version History Context for Better Code Representation}, author = {Nguyen, Huy and Treude, Christoph and Thongtanunam, Patanamon}, year = {2024}, booktitle = {Proceedings of the IEEE/ACM International Conference on Mining Software Repositories}, pages = {to appear}, doi = {}, }

2023

IEEE Software
Augmented Agile: Human-Centered AI-Assisted Software Management

Rashina Hoda, Hoa Dam, Chakkrit Tantithamthavorn, Patanamon Thongtanunam, and Margaret-Anne Storey

IEEE Software, 2023

Abs Bib HTML PDF

Agile methods have served software engineering well for over two decades, improving responsiveness to change, empowering teams, and facilitating better communication among various project stakeholders. But is it enough to lead us through the next era where balancing business value with human values has become more relevant than ever, especially in an increasingly artificial intelligence (AI)-assisted, hybrid world? We do not think so, and, in this article, we present our vision of “augmented agile” where agile practices are augmented with new capabilities made possible by AI while incorporating human-centered values.
@article{hoda2023augmented, title = {Augmented Agile: Human-Centered AI-Assisted Software Management}, author = {Hoda, Rashina and Dam, Hoa and Tantithamthavorn, Chakkrit and Thongtanunam, Patanamon and Storey, Margaret-Anne}, year = {2023}, journal = {IEEE Software}, publisher = {IEEE}, volume = {40}, number = {4}, pages = {106--109}, doi = {10.1109/MS.2023.3268725}, }
ASE
Repeated Builds During Code Review: An Empirical Study of the OpenStack Community

Rungroj Maipradit, Dong Wang, Patanamon Thongtanunam, Raula Gaikovina Kula, Yasutaka Kamei, and 1 more author

In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, 2023

Acceptance rate: 21% (134/629)

Abs Bib

Code review is a popular practice where developers critique each others’ changes. Since automated builds can identify low-level issues (e.g., syntactic errors, regression bugs), it is not uncommon for software organizations to incorporate automated builds in the code review process. In such code review deployment scenarios, submitted change sets must be approved for integration by both peer code reviewers and automated build bots. Since automated builds may produce an unreliable signal of the status of a change set (e.g., due to ’flaky’ or non-deterministic execution behaviour), code review tools, such as Gerrit, allow developers to request a ’recheck’, which repeats the build process without updating the change set. We conjecture that an unconstrained recheck command will waste time and resources if it is not applied judiciously. To explore how the recheck command is applied in a practical setting, in this paper, we conduct an empirical study of 66,932 code reviews from the OpenStack community. We quantitatively analyze (i) how often build failures are rechecked; (ii) the extent to which invoking recheck changes build failure outcomes; and (iii) how much waste is generated by invoking recheck. We observe that (i) 55% of code reviews invoke the recheck command after a failing build is reported; (ii) invoking the recheck command only changes the outcome of a failing build in 42% of the cases; and (iii) invoking the recheck command increases review waiting time by an average of 2,200% and equates to 187.4 compute years of waste—enough compute resources to compete with the oldest land living animal on earth. Our observations indicate that the recheck command is frequently used after the builds fail, but does not achieve a high likelihood of build success. While recheck currently generates plenty of wasted computational resources and bloats waiting times, it also presents exciting future opportunities for researchers and tool builders to propose solutions that can reduce waste.
@inproceedings{Maipradit2023repeated, title = {Repeated Builds During Code Review: An Empirical Study of the OpenStack Community}, author = {Maipradit, Rungroj and Wang, Dong and Thongtanunam, Patanamon and Kula, Raula Gaikovina and Kamei, Yasutaka and McIntosh, Shane}, year = {2023}, booktitle = {Proceedings of the IEEE/ACM International Conference on Automated Software Engineering}, publisher = {IEEE}, pages = {to appear}, note = {Acceptance rate: 21% (134/629)}, }
SANER
Towards Automated Code Reviews: Does Learning Code Structure Help?

Hong Yi Lin, and Patanamon Thongtanunam

In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering, 2023

Acceptance rate: 40% (12/30)

Abs Bib HTML PDF

Code review is a crucial ingredient to quality software development, but requires a large amount of time and effort for developers. To optimise this manual process, recent research on automated code review seeks to leverage Neural Machine Translation (NMT) models to perform tasks such as automated code improvement. A recent work had pretrained the NMT model for automated code review in order to equip the model with general coding knowledge. However, their pretraining approach is generic to natural languages, which does not leverage the unique properties of coding languages. Therefore, we set out to explore two state-of-the-art pretrained NMT models (i.e., CodeT5 and GraphCodeBERT) that were designed to learn code structure. We studied the models’ abilities to generate correct code improvement through an empirical evaluation based on five different datasets. Our results showed that in terms of generating correct code sequences, CodeT5, GraphCodeBERT, and the prior work achieved an average accuracy of 22%, 18%, and 10%, respectively. In terms of generating correct dataflow structures, they achieved an average accuracy of 33%, 30%, and 22%, respectively. The results suggested that the code structure focused approaches could outperform the generic pretraining approach. This work contributes towards enhancing automated code review techniques by understanding the effectiveness of code structure focused NMT models.
@inproceedings{lin2023towards, title = {Towards Automated Code Reviews: Does Learning Code Structure Help?}, author = {Lin, Hong Yi and Thongtanunam, Patanamon}, year = {2023}, booktitle = {Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering}, pages = {703--707}, doi = {10.1109/SANER56733.2023.00075}, note = {Acceptance rate: 40% (12/30)}, organization = {IEEE}, }
SANER
D-ACT: Towards Diff-Aware Code Transformation for Code Review Under a Time-Wise Evaluation

Chanathip Pornprasit, Chakkrit Tantithamthavorn, Patanamon Thongtanunam, and Chunyang Chen

In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering, 2023

Acceptance rate: 27% (56/207)

Abs Bib HTML PDF

Code review is a software quality assurance practice, yet remains time-consuming (e.g., due to slow feedback from reviewers). Recent Neural Machine Translation (NMT)-based code transformation approaches were proposed to automatically generate an approved version of changed methods for a given submitted patch. The existing approaches could change code tokens in any area in a changed method. However, not all code tokens need to be changed. Intuitively, the changed code tokens in the method should be paid more attention to than the others as they are more prone to be defective. In this paper, we present an NMT-based Diff-Aware Code Transformation approach (D-ACT) by leveraging token-level change information to enable the NMT models to better focus on the changed tokens in a changed method. We evaluate our D-ACT and the baseline approaches based on a time-wise evaluation (that is ignored by the existing work) with 5,758 changed methods. Under the time-wise evaluation scenario, our results show that (1) D-ACT can correctly transform 107 - 245 changed methods, which is at least 62% higher than the existing approaches; (2) the performance of the existing approaches drops by 57% to 94% when the time-wise evaluation is ignored; and (3) D-ACT is improved by 17%- 82% with an average of 29% when considering the token-level change information. Our results suggest that (1) NMT-based code transformation approaches for code review should be evaluated under the time-wise evaluation; and (2) the token-level change information can substantially improve the performance of NMT-based code transformation approaches for code review.
@inproceedings{pornprasit2023d, title = {D-ACT: Towards Diff-Aware Code Transformation for Code Review Under a Time-Wise Evaluation}, author = {Pornprasit, Chanathip and Tantithamthavorn, Chakkrit and Thongtanunam, Patanamon and Chen, Chunyang}, year = {2023}, booktitle = {Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering}, pages = {296--307}, doi = {10.1109/SANER56733.2023.00036}, note = {Acceptance rate: 27% (56/207)}, organization = {IEEE}, }
Trans. Info.
An Exploration of Cross-Patch Collaborations via Patch Linkage in OpenStack

Dong Wang, Patanamon Thongtanunam, Raula GAIKOVINA Kula, and Kenichi Matsumoto

IEICE Transaction on Information and Systems, 2023

Abs Bib HTML PDF

Contemporary development projects benefit from code review as it improves the quality of a project. Large ecosystems of inter-dependent projects like OpenStack generate a large number of reviews, which poses new challenges for collaboration (improving patches, fixing defects). Review tools allow developers to link between patches, to indicate patch dependency, competing solutions, or provide broader context. We hypothesize that such patch linkage may also simulate cross-collaboration. With a case study of OpenStack, we take a first step to explore collaborations that occur after a patch linkage was posted between two patches (i.e., cross-patch collaboration). Our empirical results show that although patch linkage that requests collaboration is relatively less prevalent, the probability of collaboration is relatively higher. Interestingly, the results also show that collaborative contributions via patch linkage are non-trivial, i.e, contributions can affect the review outcome (such as voting) or even improve the patch (i.e., revising). This work opens up future directions to understand barriers and opportunities related to this new kind of collaboration, that assists with code review and development tasks in large ecosystems.
@article{wang2023exploration, title = {An Exploration of Cross-Patch Collaborations via Patch Linkage in OpenStack}, author = {Wang, Dong and Thongtanunam, Patanamon and Kula, Raula GAIKOVINA and Matsumoto, Kenichi}, year = {2023}, journal = {IEICE Transaction on Information and Systems}, publisher = {The Institute of Electronics, Information and Communication Engineers}, volume = {106}, number = {2}, pages = {148--156}, doi = {10.1587/transinf.2022MPP0002}, }

2022

TSE
Giving back: Contributions congruent to library dependency changes in a software ecosystem

Supatsara Wattanakriengkrai, Dong Wang, Raula Gaikovina Kula, Christoph Treude, Patanamon Thongtanunam, and 2 more authors

IEEE Transactions on Software Engineering, 2022

Abs Bib HTML PDF

The widespread adoption of third-party libraries for contemporary software development has led to the creation of large inter-dependency networks, where sustainability issues of a single library can have widespread network effects. Maintainers of these libraries are often overworked, relying on the contributions of volunteers to sustain these libraries. To understand these contributions, in this work, we leverage socio-technical techniques to introduce and formalise dependency-contribution congruence (DC congruence) at both ecosystem and library level, i.e., to understand the degree and origins of contributions congruent to dependency changes, analyze whether they contribute to library dormancy (i.e., a lack of activity), and investigate similarities between these congruent contributions compared to typical contributions. We conduct a large-scale empirical study to measure the DC congruence for the npm ecosystem using 1.7 million issues, 970 thousand pull requests (PRs), and over 5.3 million commits belonging to 107,242 npm libraries. We find that the most congruent contributions originate from contributors who can only submit (not commit) to both a client and a library. At the project level, we find that DC congruence shares an inverse relationship with the likelihood that a library becomes dormant. Specifically, a library is less likely to become dormant if the contributions are congruent with upgrading dependencies. Finally, by comparing the source code of contributions, we find statistical differences in the file path and added lines in the source code of congruent contributions when compared to typical contributions. Our work has implications to encourage dependency contributions, especially to support library maintainers in sustaining their projects.
@article{wattanakriengkrai2022giving, title = {Giving back: Contributions congruent to library dependency changes in a software ecosystem}, author = {Wattanakriengkrai, Supatsara and Wang, Dong and Kula, Raula Gaikovina and Treude, Christoph and Thongtanunam, Patanamon and Ishio, Takashi and Matsumoto, Kenichi}, year = {2022}, journal = {IEEE Transactions on Software Engineering}, publisher = {IEEE}, volume = {49}, number = {4}, pages = {2566--2579}, doi = {10.1109/TSE.2022.3225197}, }
ESEC/FSE
Commentfinder: a simpler, faster, more accurate code review comments recommendation

Yang Hong, Chakkrit Tantithamthavorn, Patanamon Thongtanunam, and Aldeida Aleti

In Proceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022

Acceptance rate: 22% (99/449)

Abs Bib HTML PDF

Code review is an effective quality assurance practice, but can be labor-intensive since developers have to manually review the code and provide written feedback. Recently, a Deep Learning (DL)-based approach was introduced to automatically recommend code review comments based on changed methods. While the approach showed promising results, it requires expensive computational resource and time which limits its use in practice. To address this limitation , we propose CommentFinder ś a retrieval-based approach to recommend code review comments. Through an empirical evaluation of 151,019 changed methods, we evaluate the effectiveness and efficiency of CommentFinder against the state-of-the-art approach. We find that when recommending the best-1 review comment candidate, our CommentFinder is 32% better than prior work in recommending the correct code review comment. In addition, CommentFinder is 49 times faster than the prior work. These findings highlight that our CommentFinder could help reviewers to reduce the manual efforts by recommending code review comments, while requiring less computational time.
@inproceedings{hong2022commentfinder, author = {Hong, Yang and Tantithamthavorn, Chakkrit and Thongtanunam, Patanamon and Aleti, Aldeida}, year = {2022}, booktitle = {Proceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering}, pages = {507--519}, doi = {10.1145/3540250.3549119}, note = {Acceptance rate: 22% (99/449)}, }
EMSE
Story points changes in agile iterative development: An empirical study and a prediction approach

Jirat Pasuksmit, Patanamon Thongtanunam, and Shanika Karunasekera

Empirical Software Engineering, 2022

Abs Bib HTML PDF

Story Points (SP) are an effort unit that is used to represent the relative effort of a work item. In Agile software development, SP allows a development team to estimate their delivery capacity and facilitate the sprint planning activities. Although Agile embraces changes, SP changes after the sprint planning may negatively impact the sprint plan. To minimize the impact, there is a need to better understand the SP changes and an automated approach to predict the SP changes. Hence, to better understand the SP changes, we examine the prevalence, accuracy, and impact of information changes on SP changes. Through the analyses based on 19,349 work items spread across seven open-source projects, we find that on average, 10% of the work items have SP changes. These work items typically have SP value increased by 58%-100% relative to the initial SP value when they were assigned to a sprint. We also find that the unchanged SP reflect the development time better than the changed SP. Our qualitative analysis shows that the work items with changed SP often have the information changes relating to updating the scope of work. Our empirical results suggest that SP and the scope of work should be reviewed prior or during sprint planning to achieve a reliable sprint plan. Yet, it could be a tedious task to review all work items in the product (or sprint) backlog. Therefore, we develop a classifier to predict whether a work item will haveSP changes after being assigned to a sprint. Our classifier achieves an AUC of 0.69-0.8, which is significantly better than the baselines. Our results suggest hat to better manage and prepare for the unreliability in SP estimation, the team can leverage our insights and the classifier during the sprint planning. To facilitate future studies, we provide the replication package and the datasets,which are available online.
@article{pasuksmit2022story, title = {Story points changes in agile iterative development: An empirical study and a prediction approach}, author = {Pasuksmit, Jirat and Thongtanunam, Patanamon and Karunasekera, Shanika}, year = {2022}, journal = {Empirical Software Engineering}, publisher = {Springer}, volume = {27}, number = {6}, pages = {156}, doi = {10.1007/s10664-022-10192-9}, }
MSR
Towards reliable agile iterative planning via predicting documentation changes of work items

Jirat Pasuksmit, Patanamon Thongtanunam, and Shanika Karunasekera

In Proceedings of the 19th International Conference on Mining Software Repositories, 2022

Acceptance rate: 34% (45/137)

Abs Bib HTML PDF

In agile iterative development, an agile team needs to analyze documented information for effort estimation and sprint planning. While documentation can be changed, the documentation changes after sprint planning may invalidate the estimated effort and sprint plan. Hence, to help the team be aware of the potential documentation changes, we developed DocWarn to estimate the probability that a work item will have documentation changes. We developed three variations of DocWarn, which are based on the characteristics extracted from the work items (DocWarn-C), the natural language text (DocWarn-T), and both inputs (DocWarn-H). Based on nine open-source projects that work in sprints and actively maintain documentation, DocWarn can predict the documentation changes with an average AUC of 0.75 and an average F1-Score of 0.36, which are significantly higher than the baseline. We also found that the most influential characteristics of a work item for determining the future documentation changes are the past tendency of developers and the length of description text. Based on the qualitative assessment, we found that 40%-68% of the correctly predicted documentation changes were related to scope modification, and such changes could impact the accuracy of estimated effort and the sprint plan. With the prediction of DocWarn, the team will be better aware of the potential documentation changes during sprint planning, allowing the team to manage the uncertainty and reduce the risk of unreliable effort estimation and sprint planning.
@inproceedings{pasuksmit2022towards, title = {Towards reliable agile iterative planning via predicting documentation changes of work items}, author = {Pasuksmit, Jirat and Thongtanunam, Patanamon and Karunasekera, Shanika}, year = {2022}, booktitle = {Proceedings of the 19th International Conference on Mining Software Repositories}, pages = {35--47}, doi = {10.1145/3524842.3528445}, note = {Acceptance rate: 34% (45/137)}, }
ICSE
Autotransform: Automated code transformation to support modern code review process

Patanamon Thongtanunam, Chanathip Pornprasit, and Chakkrit Tantithamthavorn

In Proceedings of the IEEE/ACM International Conference on Software Engineering, 2022

Acceptance rate: 26% (200/751)

Abs Bib HTML PDF Slides

Code review is effective, but human-intensive (e.g., developers need to manually modify source code until it is approved). Recently, prior work proposed a Neural Machine Translation (NMT) approach to automatically transform source code to the version that has been reviewed and approved (i.e., the after version). Yet, its performance is still sub-optimal when the after version has new identifiers or literals (e.g., renamed variables) or has many code tokens. To address these limitations, we proposed AutoTransform which leverages a Byte-Pair Encoding (BPE) approach to handle new tokens and a Transformer-based NMT architecture to handle long sequences. We evaluated our approach based on 147,553 changed methods with and without new tokens for both small and medium sizes. The results showed that our AutoTransform can correctly transform 34-526 changed methods, which is at least 262% higher than the prior work, highlighting the substantial improvement of our approach for code transformation in the context of code review. This work contributes towards automated code transform for code reviews, which could help developers reduce their effort in modifying source code during the code review process.
@inproceedings{thongtanunam2022autotransform, author = {Thongtanunam, Patanamon and Pornprasit, Chanathip and Tantithamthavorn, Chakkrit}, year = {2022}, booktitle = {Proceedings of the IEEE/ACM International Conference on Software Engineering}, pages = {237--248}, doi = {10.1145/3510003.3510067}, note = {Acceptance rate: 26% (200/751)}, }
SANER
Where should I look at? Recommending lines that reviewers should pay attention to

Yang Hong, Chakkrit Tantithamthavorn, and Patanamon Thongtanunam

In Proceedings of theIEEE International Conference on Software Analysis, Evolution and Reengineering, 2022

Acceptance rate: 36% (72/199)

Abs Bib HTML PDF

Code review is an effective quality assurance practice, yet can be time-consuming since reviewers have to carefully review all new added lines in a patch. Our analysis shows that at the median, patch authors often waited 15-64 hours to receive initial feedback from reviewers, which accounts for 16%-26% of the whole review time of a patch. Importantly, we also found that large patches tend to receive initial feedback from reviewers slower than smaller patches. Hence, it would be beneficial to reviewers to reduce their effort with an approach to pinpoint the lines that they should pay attention to. In this paper, we proposed REVSPOT - a machine learning-based approach to predict problematic lines (i.e., lines that will receive a comment and lines that will be revised). Through a case study of three open-source projects (i.e., Openstack Nova, Openstack Ironic, and Qt Base), REVSPOT can accurately predict lines that will receive comments and will be revised (with a Top-10 Accuracy of 81% and 93%, which is 56% and 15% better than the baseline approach), and these correctly predicted problematic lines are related to logic defects, which could impact the functionality of the system. Based on these findings, our REVSPOT could help reviewers to reduce their reviewing effort by reviewing a smaller set of lines and increasing code review speed and reviewers’ productivity.
@inproceedings{hong2022should, title = {Where should I look at? Recommending lines that reviewers should pay attention to}, author = {Hong, Yang and Tantithamthavorn, Chakkrit and Thongtanunam, Patanamon}, year = {2022}, booktitle = {Proceedings of theIEEE International Conference on Software Analysis, Evolution and Reengineering}, pages = {1034--1045}, doi = {10.1109/SANER53432.2022.00121}, note = {Acceptance rate: 36% (72/199)}, organization = {IEEE}, }

2021

ASE
Pyexplainer: Explaining the predictions of just-in-time defect models

Chanathip Pornprasit, Chakkrit Tantithamthavorn, Jirayus Jiarpakdee, Michael Fu, and Patanamon Thongtanunam

In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, 2021

Acceptance rate: 19% (82/427)

Abs Bib HTML PDF

Just-In-Time (JIT) defect prediction (i.e., an AI/ML model to predict defect-introducing commits) is proposed to help developers prioritize their limited Software Quality Assurance (SQA) resources on the most risky commits. However, the explainability of JIT defect models remains largely unexplored (i.e., practitioners still do not know why a commit is predicted as defect-introducing). Recently, LIME has been used to generate explanations for any AI/ML models. However, the random perturbation approach used by LIME to generate synthetic neighbors is still suboptimal, i.e., generating synthetic neighbors that may not be similar to an instance to be explained, producing low accuracy of the local models, leading to inaccurate explanations for just-in-time defect models. In this paper, we propose PyExplainer—i.e., a local rule-based model-agnostic technique for generating explanations (i.e., why a commit is predicted as defective) of JIT defect models. Through a case study of two open-source software projects, we find that our PyExplainer produces (1) synthetic neighbors that are 41%-45% more similar to an instance to be explained; (2) 18%-38% more accurate local models; and (3) explanations that are 69%-98% more unique and 17%-54% more consistent with the actual characteristics of defect-introducing commits in the future than LIME (a state-of-the-art model-agnostic technique). This could help practitioners focus on the most important aspects of the commits to mitigate the risk of being defect-introducing. Thus, the contributions of this paper build an important step towards Explainable AI for Software Engineering, making software analytics more explainable and actionable. Finally, we publish our PyExplainer as a Python package to support practitioners and researchers.
@inproceedings{pornprasit2021pyexplainer, title = {Pyexplainer: Explaining the predictions of just-in-time defect models}, author = {Pornprasit, Chanathip and Tantithamthavorn, Chakkrit and Jiarpakdee, Jirayus and Fu, Michael and Thongtanunam, Patanamon}, year = {2021}, booktitle = {Proceedings of the IEEE/ACM International Conference on Automated Software Engineering}, pages = {407--418}, doi = {10.1109/ASE51524.2021.9678763}, note = {Acceptance rate: 19% (82/427)}, organization = {IEEE}, }
SIGSOFT. SEN.
Shadow Program Committee Initiative: Process and Reflection

Patanamon Thongtanunam, Ayushi Rastogi, Foutse Khomh, Serge Demeyer, Meiyappan Nagappan, and 2 more authors

ACM SIGSOFT Software Engineering Notes, 2021

Abs Bib HTML PDF

The Shadow Program Committee (PC) is an initiative/program that provides an opportunity to Early-Career Researchers (ECRs), i.e., PhD students, postdocs, new faculty members, and industry practitioners, who have not been in a PC, to learn first-hand about the peer-review process of the technical track at Software Engineering (SE) conferences. This program aims to train the next generation of PC members as well as to allow ECRs to be recognized and embedded in the research community. By participating in this program, ECRs will have a great chance i) to gain experience about the reviewing process including the restrictions and ethical standards of the academic peer-review process; ii) to be mentored by senior researchers on how to write a good review; and iii) to create a network with other ECRs and senior researchers (i.e., Shadow PC advisors).
@article{thongtanunam2021shadow, title = {Shadow Program Committee Initiative: Process and Reflection}, author = {Thongtanunam, Patanamon and Rastogi, Ayushi and Khomh, Foutse and Demeyer, Serge and Nagappan, Meiyappan and Blincoe, Kelly and Robles, Gregorio}, year = {2021}, journal = {ACM SIGSOFT Software Engineering Notes}, publisher = {ACM New York, NY, USA}, volume = {46}, number = {4}, pages = {16--18}, doi = {10.1145/3485952.3485956}, }
ICSME
Towards Just-Enough Documentation for Agile Effort Estimation: What Information Should Be Documented?

Jirat Pasuksmit, Patanamon Thongtanunam, and Shanika Karunasekera

In Proceedings of the IEEE International Conference on Software Maintenance and Evolution, 2021

Acceptance rate: 24% (43/179)

Abs Bib HTML PDF Slides

Effort estimation is an integral part of activities planning in Agile iterative development. An Agile team estimates the effort of a task based on the available information which is usually conveyed through documentation. However, as documentation has a lower priority in Agile, little is known about how documentation effort can be optimized while achieving accurate estimation. Hence, to help practitioners achieve just-enough documentation for effort estimation, we investigated the different types of documented information that practitioners considered useful for effort estimation. We conducted a survey study with 121 Agile practitioners across 25 countries. Our survey results showed that (1) despite the lower priority of documentation in Agile practices, 98% of the respondents considered documented information moderately to extremely important when estimating effort, (2) 73% of them reported that they would re-estimate a task when the documented information was changed, and (3) functional requirements, user stories, definition of done, UI wireframes, and acceptance criteria were ranked as the most useful types of documented information for effort estimation. Nevertheless, many respondents reported that these useful types of documented information were occasionally changing or missing. Based on our study results, we provide recommendations for agile practitioners on how effort estimation can be improved by focusing on just-enough and just-in-time documentation.
@inproceedings{pasuksmit2021towards, title = {Towards Just-Enough Documentation for Agile Effort Estimation: What Information Should Be Documented?}, author = {Pasuksmit, Jirat and Thongtanunam, Patanamon and Karunasekera, Shanika}, year = {2021}, booktitle = {Proceedings of the IEEE International Conference on Software Maintenance and Evolution}, pages = {114--125}, doi = {10.1109/ICSME52107.2021.00017}, note = {Acceptance rate: 24% (43/179)}, organization = {IEEE}, }
EMSE
Understanding shared links and their intentions to meet information needs in modern code review: A case study of the OpenStack and Qt projects

Dong Wang, Tao Xiao, Patanamon Thongtanunam, Raula Gaikovina Kula, and Kenichi Matsumoto

Empirical Software Engineering, 2021

Abs Bib HTML PDF

Code reviews serve as a quality assurance activity for software teams. Especially for Modern Code Review, sharing a link during a review discussion serves as an effective awareness mechanism where “Code reviews are good FYIs [for your information].” Although prior work has explored link sharing and the information needs of a code review, the extent to which links are used to properly conduct a review is unknown. In this study, we performed a mixed-method approach to investigate the practice of link sharing and their intentions. First, through a quantitative study of the OpenStack and Qt projects, we identify 19,268 reviews that have 39,686 links to explore the extent to which the links are shared, and analyze a correlation between link sharing and review time. Then in a qualitative study, we manually analyze 1,378 links to understand the role and usefulness of link sharing. Results indicate that internal links are more widely referred to (93% and 80% for the two projects). Importantly, although the majority of the internal links are referencing to reviews, bug reports and source code are also shared in review discussions. The statistical models show that the number of internal links as an explanatory factor does have an increasing relationship with the review time. Finally, we present seven intentions of link sharing, with providing context being the most common intention for sharing links. Based on the findings and a developer survey, we encourage the patch author to provide clear context and explore both internal and external resources, while the review team should continue link sharing activities. Future research directions include the investigation of causality between sharing links and the review process, as well as the potential for tool support.
@article{wang2021understanding, author = {Wang, Dong and Xiao, Tao and Thongtanunam, Patanamon and Kula, Raula Gaikovina and Matsumoto, Kenichi}, year = {2021}, journal = {Empirical Software Engineering}, publisher = {Springer}, volume = {26}, pages = {1--32}, doi = {10.1007/s10664-021-09997-x}, }
ICSE-SEET
Assessing the students’ understanding and their mistakes in code review checklists: an experience report of 1,791 code review checklist questions from 394 students

Chun Yong Chong, Patanamon Thongtanunam, and Chakkrit Tantithamthavorn

In Proceedings of the IEEE/ACM International Conference on Software Engineering: Software Engineering Education and Training, 2021

Acceptance rate: 33% (31/93)

Abs Bib HTML PDF

Code review is a widely-used practice in software development companies to identify defects. Hence, code review has been included in many software engineering curricula at universities worldwide. However, teaching code review is still a challenging task because the code review effectiveness depends on the code reading and analytical skills of a reviewer. While several studies have investigated the code reading techniques that students should use to find defects during code review, little has focused on a learning activity that involves analytical skills. Indeed, developing a code review checklist should stimulate students to develop their analytical skills to anticipate potential issues (i.e., software defects). Yet, it is unclear whether students can anticipate potential issues given their limited experience in software development (programming, testing, etc.). We perform a qualitative analysis to investigate whether students are capable of creating code review checklists, and if the checklists can be used to guide reviewers to find defects. In addition, we identify common mistakes that students make when developing a code review checklist. Our results show that while there are some misconceptions among students about the purpose of code review, students are able to anticipate potential defects and create a relatively good code review checklist. Hence, our results lead us to conclude that developing a code review checklist can be a part of the learning activities for code review in order to scaffold students’ skills.
@inproceedings{chong2021assessing, author = {Chong, Chun Yong and Thongtanunam, Patanamon and Tantithamthavorn, Chakkrit}, year = {2021}, booktitle = {Proceedings of the IEEE/ACM International Conference on Software Engineering: Software Engineering Education and Training}, pages = {20--29}, doi = {10.1109/ICSE-SEET52601.2021.00011}, note = {Acceptance rate: 33% (31/93)}, organization = {IEEE}, }
SANER
Anti-patterns in modern code review: Symptoms and prevalence

Moataz Chouchen, Ali Ouni, Raula Gaikovina Kula, Dong Wang, Patanamon Thongtanunam, and 2 more authors

In Proceedings of the IEEE international conference on software analysis, evolution and reengineering, 2021

Acceptance rate: 46% (12/26)

Abs Bib HTML PDF

Modern code review (MCR) is now broadly adopted as an established and effective software quality assurance practice , with an increasing number of open-source as well as commercial software projects identifying code review as a crucial practice. During the MCR process, developers review, provide constructive feedback, and/or critique each others’ patches before a code change is merged into the codebase. Nevertheless, code review is basically a human task that involves technical, personal and social aspects. Existing literature hint the existence of poor reviewing practices i.e., anti-patterns, that may contribute to a tense reviewing culture, degradation of software quality, slow down integration, and may affect the overall sustainability of the project. To better understand these practices, we present in this paper the concept of Modern Code Review Anti-patterns (MCRA) and take a first step to define a catalog that enumerates common poor code review practices. In detail we explore and characterize MCRA symptoms, causes, and impacts. We also conduct a series of preliminary experiments to investigate the prevalence and co-occurrences of such anti-patterns on a random sample of 100 code reviews from various OpenStack projects.
@inproceedings{chouchen2021anti, author = {Chouchen, Moataz and Ouni, Ali and Kula, Raula Gaikovina and Wang, Dong and Thongtanunam, Patanamon and Mkaouer, Mohamed Wiem and Matsumoto, Kenichi}, year = {2021}, booktitle = {Proceedings of the IEEE international conference on software analysis, evolution and reengineering}, pages = {531--535}, doi = {10.1109/SANER50967.2021.00060}, note = {Acceptance rate: 46% (12/26)}, organization = {IEEE}, }

2020

PROMISE
Workload-aware reviewer recommendation using a multi-objective search-based approach

Wisam Haitham Abbood Al-Zubaidi, Patanamon Thongtanunam, Hoa Khanh Dam, Chakkrit Tantithamthavorn, and Aditya Ghose

In Proceedings of the ACM International Conference on Predictive Models and Data Analytics in Software Engineering, 2020

Abs Bib HTML PDF Slides

Reviewer recommendation approaches have been proposed to provide automated support in finding suitable reviewers to review a given patch. However, they mainly focused on reviewer experience, and did not take into account the review workload, which is another important factor for a reviewer to decide if they will accept a review invitation. Aim: We set out to empirically investigate the feasibility of automatically recommending reviewers while considering the review workload amongst other factors. Method: We develop a novel approach that leverages a multi-objective meta-heuristic algorithm to search for reviewers guided by two objectives , i.e., (1) maximizing the chance of participating in a review, and (2) minimizing the skewness of the review workload distribution among reviewers. Results: Through an empirical study of 230,090 patches with 7,431 reviewers spread across four open source projects, we find that our approach can recommend reviewers who are potentially suitable for a newly-submitted patch with 19%-260% higher F-measure than the five benchmarks. Conclusion: Our empirical results demonstrate that the review workload and other important information should be taken into consideration in finding reviewers who are potentially suitable for a newly-submitted patch. In addition, the results show the effectiveness of realizing this approach using a multi-objective search-based approach.
@inproceedings{al2020workload, title = {Workload-aware reviewer recommendation using a multi-objective search-based approach}, author = {Al-Zubaidi, Wisam Haitham Abbood and Thongtanunam, Patanamon and Dam, Hoa Khanh and Tantithamthavorn, Chakkrit and Ghose, Aditya}, year = {2020}, booktitle = {Proceedings of the ACM International Conference on Predictive Models and Data Analytics in Software Engineering}, pages = {21--30}, doi = {10.1145/3416508.3417115}, }
TSE
Predicting defective lines using a model-agnostic technique

Supatsara Wattanakriengkrai, Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Hideaki Hata, and Kenichi Matsumoto

IEEE Transactions on Software Engineering, 2020

Abs Bib HTML PDF

Defect prediction models are proposed to help a team prioritize source code areas files that need Software Quality Assurance (SQA) based on the likelihood of having defects. However, developers may waste their unnecessary effort on the whole file while only a small fraction of its source code lines are defective. Indeed, we find that as little as 1%-3% of lines of a file are defective. Hence, in this work, we propose a novel framework (called LINE-DP) to identify defective lines using a model-agnostic technique, i.e., an Explainable AI technique that provides information why the model makes such a prediction. Broadly speaking, our LINE-DP first builds a file-level defect model using code token features. Then, our LINE-DP uses a state-of-the-art model-agnostic technique (i.e., LIME) to identify risky tokens, i.e., code tokens that lead the file-level defect model to predict that the file will be defective. Then, the lines that contain risky tokens are predicted as defective lines. Through a case study of 32 releases of nine Java open source systems, our evaluation results show that our LINE-DP achieves an average recall of 0.61, a false alarm rate of 0.47, a top 20%LOC recall of 0.27, and an initial false alarm of 16, which are statistically better than six baseline approaches. Our evaluation shows that our LINE-DP requires an average computation time of 10 seconds including model construction and defective identification time. In addition, we find that 63% of defective lines that can be identified by our LINE-DP are related to common defects (e.g., argument change, condition change). These results suggest that our LINE-DP can effectively identify defective lines that contain common defects while requiring a smaller amount of inspection effort and a manageable computation cost. The contribution of this paper builds an important step towards line-level defect prediction by leveraging a model-agnostic technique.
@article{wattanakriengkrai2020predicting, title = {Predicting defective lines using a model-agnostic technique}, author = {Wattanakriengkrai, Supatsara and Thongtanunam, Patanamon and Tantithamthavorn, Chakkrit and Hata, Hideaki and Matsumoto, Kenichi}, year = {2020}, journal = {IEEE Transactions on Software Engineering}, publisher = {IEEE}, volume = {48}, number = {5}, pages = {1480--1496}, doi = {10.1109/TSE.2020.3023177}, }
TSE
Review dynamics and their impact on software quality

Patanamon Thongtanunam, and Ahmed E Hassan

IEEE Transactions on Software Engineering, 2020

Abs Bib HTML PDF Slides

Code review is a crucial activity for ensuring the quality of software products. Unlike the traditional code review process of the past where reviewers independently examine software artifacts, contemporary code review processes allow teams to collaboratively examine and discuss proposed patches. While the visibility of reviewing activities including review discussions in a contemporary code review tends to increase developer collaboration and openness, little is known whether such visible information influences the evaluation decision of a reviewer or not (i.e., knowing others’ feedback about the patch before providing ones own feedback). Therefore, in this work, we set out to investigate the review dynamics, i.e., a practice of providing a vote to accept a proposed patch, in a code review process. To do so, we first characterize the review dynamics by examining the relationship between the evaluation decision of a reviewer and the visible information about a patch under review (e.g., comments and votes that are provided by prior co-reviewers). We then investigate the association between the characterized review dynamics and the defect-proneness of a patch. Through a case study of 83,750 patches of the OpenStack and Qt projects, we observe that the amount of feedback (either votes and comments of prior reviewers) and the co-working frequency of a reviewer with the patch author are highly associated with the likelihood that the reviewer will provide a positive vote to accept a proposed patch. Furthermore, we find that the proportion of reviewers who provided a vote consistent with prior reviewers is significantly associated with the defect-proneness of a patch. However, the associations of these review dynamics are not as strong as the confounding factors (i.e., patch characteristics and overall reviewing activities). Our observations shed light on the implicit influence of the visible information about a patch under review on the evaluation decision of a reviewer. Our findings suggest that the code reviewing policies that are mindful of these practices may help teams improve code review effectiveness. Nonetheless, such review dynamics should not be too concerning in terms of software quality.
@article{thongtanunam2020review, title = {Review dynamics and their impact on software quality}, author = {Thongtanunam, Patanamon and Hassan, Ahmed E}, year = {2020}, journal = {IEEE Transactions on Software Engineering}, publisher = {IEEE}, volume = {47}, number = {12}, pages = {2698--2712}, doi = {10.1109/TSE.2020.2964660}, }

2019

MSR
Automatically generating documentation for lambda expressions in java

Anwar Alqaimi, Patanamon Thongtanunam, and Christoph Treude

In Proceedings of the IEEE/ACM 16th International Conference on Mining Software Repositories, 2019

Acceptance rate: 25% (32/126)

Abs Bib HTML PDF

Abstract—When lambda expressions were introduced to the Java programming language as part of the release of Java 8 in 2014, they were the language’s first step into functional programming. Since lambda expressions are still relatively new, not all developers use or understand them. In this paper, we first present the results of an empirical study to determine how frequently developers of GitHub repositories make use of lambda expressions and how they are documented. We find that 11% of Java GitHub repositories use lambda expressions, and that only 6% of the lambda expressions are accompanied by source code comments. We then present a tool called LambdaDoc which can automatically detect lambda expressions in a Java repository and generate natural language documentation for them. Our evaluation of LambdaDoc with 23 professional developers shows that they perceive the generated documentation to be complete, concise, and expressive, while the majority of the documentation produced by our participants without tool support was inadequate. Our contribution builds an important step towards automatically generating documentation for functional programming constructs in an object-oriented language.
@inproceedings{alqaimi2019automatically, title = {Automatically generating documentation for lambda expressions in java}, author = {Alqaimi, Anwar and Thongtanunam, Patanamon and Treude, Christoph}, year = {2019}, booktitle = {Proceedings of the IEEE/ACM 16th International Conference on Mining Software Repositories}, pages = {310--320}, doi = {10.1109/MSR.2019.00057}, note = {Acceptance rate: 25% (32/126)}, organization = {IEEE}, }
ICSE
Mining software defects: Should we consider affected releases?

Suraj Yatish, Jirayus Jiarpakdee, Patanamon Thongtanunam, and Chakkrit Tantithamthavorn

In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering, 2019

Acceptance rate: 21% (109/529)

Abs Bib HTML PDF

With the rise of the Mining Software Repositories (MSR) field, defect datasets extracted from software repositories play a foundational role in many empirical studies related to software quality. At the core of defect data preparation is the identification of post-release defects. Prior studies leverage many heuristics (e.g., keywords and issue IDs) to identify post-release defects. However, such heuristic approach is based on several assumptions, which pose common threats to the validity of many studies. In this paper, we set out to investigate the nature of the difference of defect datasets generated by the heuristic approach and the realistic approach that leverages the earliest affected release that is realistically estimated by a software development team for a given defect. In addition, we investigate the impact of defect identification approaches on the predictive accuracy and the ranking of defective modules that are produced by defect models. Through a case study of defect datasets of 32 releases, we conclude that the heuristic approach has a large impact on both defect count datasets and binary defect datasets. On the other hand, the heuristic approach has a minimal impact on the predictive accuracy and the ranking of defective modules that are produced by defect count models and defect classification models. Our findings suggest that practitioners and researchers should not be too concerned about the predictive accuracy and the ranking of defective modules produced by defect models that are constructed using heuristic defect datasets.
@inproceedings{yatish2019mining, title = {Mining software defects: Should we consider affected releases?}, author = {Yatish, Suraj and Jiarpakdee, Jirayus and Thongtanunam, Patanamon and Tantithamthavorn, Chakkrit}, year = {2019}, booktitle = {Proceedings of the IEEE/ACM 41st International Conference on Software Engineering}, pages = {654--665}, doi = {10.1109/ICSE.2019.00075}, note = {Acceptance rate: 21% (109/529)}, organization = {IEEE}, }
EMSE
Will this clone be short-lived? Towards a better understanding of the characteristics of short-lived clones

Patanamon Thongtanunam, Weiyi Shang, and Ahmed E Hassan

Empirical Software Engineering, 2019

Abs Bib HTML PDF Slides

Code clones are created when a developer duplicates a code fragment to reuse existing functionalities. Mitigating clones by refactoring them helps ease the long-term maintenance of large software systems. However, refactoring can introduce an additional cost. Prior work also suggest that refactoring all clones can be counterproductive since clones may live in a system for a short duration. Hence, it is beneficial to determine in advance whether a newly-introduced clone will be short-lived or long-lived to plan the most effective use of resources. In this work, we perform an empirical study on six open source Java systems to better understand the life expectancy of clones. We find that a large number of clones (i.e., 30% to 87%) lived in the systems for a short duration. Moreover, we find that although short-lived clones were changed more frequently than long-lived clones throughout their lifetime, short-lived clones were consistently changed with their siblings less often than long-lived clones. Furthermore, we build random forest classifiers in order to determine the life expectancy of a newly-introduced clone (i.e., whether a clone will be short-lived or long-lived). Our empirical results show that our random forest classifiers can determine the life expectancy of a newly-introduced clone with an average AUC of 0.63 to 0.92. We also find that the churn made to the methods containing a newly-introduced clone, the complexity and size of the methods containing the newly-introduced clone are highly influential in determining whether the newly-introduced clone will be short-lived. Furthermore, the size of a newly-introduced clone shares a positive relationship with the likelihood that the newly-introduced clone will be short-lived. Our results suggest that, to improve the efficiency of clone management efforts, practitioners can leverage our classifiers and insights in order to determine whether a newly-introduced clone will be short-lived or long-lived to plan the most effective use of their clone management resources in advance.
@article{thongtanunam2019will, title = {Will this clone be short-lived? Towards a better understanding of the characteristics of short-lived clones}, author = {Thongtanunam, Patanamon and Shang, Weiyi and Hassan, Ahmed E}, year = {2019}, journal = {Empirical Software Engineering}, publisher = {Springer}, volume = {24}, pages = {937--972}, doi = {10.1007/s10664-018-9645-2}, }
EMSE
The impact of human factors on the participation decision of reviewers in modern code review

Shade Ruangwan, Patanamon Thongtanunam, Akinori Ihara, and Kenichi Matsumoto

Empirical Software Engineering, 2019

Abs Bib HTML PDF

Code clones are created when a developer duplicates a code fragment to reuse existing functionalities. Mitigating clones by refactoring them helps ease the long-term maintenance of large software systems. However, refactoring can introduce an additional cost. Prior work also suggest that refactoring all clones can be counterproductive since clones may live in a system for a short duration. Hence, it is beneficial to determine in advance whether a newly-introduced clone will be short-lived or long-lived to plan the most effective use of resources. In this work, we perform an empirical study on six open source Java systems to better understand the life expectancy of clones. We find that a large number of clones (i.e., 30% to 87%) lived in the systems for a short duration. Moreover, we find that although short-lived clones were changed more frequently than long-lived clones throughout their lifetime, short-lived clones were consistently changed with their siblings less often than long-lived clones. Furthermore, we build random forest classifiers in order to determine the life expectancy of a newly-introduced clone (i.e., whether a clone will be short-lived or long-lived). Our empirical results show that our random forest classifiers can determine the life expectancy of a newly-introduced clone with an average AUC of 0.63 to 0.92. We also find that the churn made to the methods containing a newly-introduced clone, the complexity and size of the methods containing the newly-introduced clone are highly influential in determining whether the newly-introduced clone will be short-lived. Furthermore, the size of a newly-introduced clone shares a positive relationship with the likelihood that the newly-introduced clone will be short-lived. Our results suggest that, to improve the efficiency of clone management efforts, practitioners can leverage our classifiers and insights in order to determine whether a newly-introduced clone will be short-lived or long-lived to plan the most effective use of their clone management resources in advance.
@article{ruangwan2019impact, author = {Ruangwan, Shade and Thongtanunam, Patanamon and Ihara, Akinori and Matsumoto, Kenichi}, year = {2019}, journal = {Empirical Software Engineering}, publisher = {Springer}, volume = {24}, pages = {973--1016}, doi = {10.1007/s10664-018-9646-1}, }

2017

EMSE
Review participation in modern code review: An empirical study of the android, Qt, and OpenStack projects

Patanamon Thongtanunam, Shane McIntosh, Ahmed E Hassan, and Hajimu Iida

Empirical Software Engineering, 2017

Abs Bib HTML PDF Slides

Software code review is a well-established software quality practice. Recently, Modern Code Review (MCR) has been widely adopted in both open source and proprietary projects. Our prior work shows that review participation plays an important role in MCR practices, since the amount of review participation shares a relationship with software quality. However, little is known about which factors influence review participation in the MCR process. Hence, in this study, we set out to investigate the characteristics of patches that: (1) do not attract reviewers, (2) are not discussed, and (3) receive slow initial feedback. Through a case study of 196,712 reviews spread across the Android, Qt, and OpenStack open source projects, we find that the amount of review participation in the past is a significant indicator of patches that will suffer from poor review participation. Moreover, we find that the description length of a patch shares a relationship with the likelihood of receiving poor reviewer participation or discussion, while the purpose of introducing new features can increase the likelihood of receiving slow initial feedback. Our findings suggest that the patches with these characteristics should be given more attention in order to increase review participation, which will likely lead to a more responsive review process.
@article{thongtanunam2017review, author = {Thongtanunam, Patanamon and McIntosh, Shane and Hassan, Ahmed E and Iida, Hajimu}, year = {2017}, journal = {Empirical Software Engineering}, publisher = {Springer}, volume = {22}, pages = {768--817}, doi = {10.1007/s10664-016-9452-6}, }

2016

ICSE
Revisiting code ownership and its relationship with software quality in the scope of modern code review

Patanamon Thongtanunam, Shane McIntosh, Ahmed E Hassan, and Hajimu Iida

In Proceedings of the International Conference on Software Engineering, 2016

Acceptance rate: 19% (101/530)

Abs Bib HTML PDF Slides

Code ownership establishes a chain of responsibility for modules in large software systems. Although prior work uncovers a link between code ownership heuristics and software quality, these heuristics rely solely on the authorship of code changes. In addition to authoring code changes, developers also make important contributions to a module by reviewing code changes. Indeed, recent work shows that reviewers are highly active in modern code review processes, often suggesting alternative solutions or providing updates to the code changes. In this paper, we complement traditional code ownership heuristics using code review activity. Through a case study of six releases of the large Qt and OpenStack systems, we find that: (1) 67%-86% of developers did not author any code changes for a module, but still actively contributed by reviewing 21%-39% of the code changes, (2) code ownership heuristics that are aware of reviewing activity share a relationship with software quality, and (3) the proportion of reviewers without expertise shares a strong, increasing relationship with the likelihood of having post-release defects. Our results suggest that reviewing activity captures an important aspect of code ownership, and should be included in approximations of it in future studies.
@inproceedings{thongtanunam2016revisiting, author = {Thongtanunam, Patanamon and McIntosh, Shane and Hassan, Ahmed E and Iida, Hajimu}, year = {2016}, booktitle = {Proceedings of the International Conference on Software Engineering}, pages = {1039--1050}, doi = {10.1145/2884781.2884852}, note = {Acceptance rate: 19% (101/530)}, }

2015

MSR
Investigating code review practices in defective files: An empirical study of the qt system

Patanamon Thongtanunam, Shane McIntosh, Ahmed E Hassan, and Hajimu Iida

In Proceedings of the International Conference on Mining Software Repositories, 2015

Acceptance Rate: 30% (32/106)

Abs Bib HTML PDF Slides

Software code review is a well-established software quality practice. Recently, Modern Code Review (MCR) has been widely adopted in both open source and industrial projects. To evaluate the impact that characteristics of MCR practices have on software quality, this paper comparatively studies MCR practices in defective and clean source code files. We investigate defective files along two perspectives: 1) files that will eventually have defects (i.e., future-defective files) and 2) files that have historically been defective (i.e., risky files). Through an empirical study of 11,736 reviews of changes to 24,486 files from the Qt open source system, we find that both future-defective files and risky files tend to be reviewed less rigorously than their clean counterparts. We also find that the concerns addressed during the code reviews of both defective and clean files tend to enhance evolvability, i.e., ease future maintenance (like documentation), rather than focus on functional issues (like incorrect program logic). Our findings suggest that although functionality concerns are rarely addressed during code review, the rigor of the reviewing process that is applied to a source code file throughout a development cycle shares a link with its defect proneness.
@inproceedings{thongtanunam2015investigating, author = {Thongtanunam, Patanamon and McIntosh, Shane and Hassan, Ahmed E and Iida, Hajimu}, year = {2015}, booktitle = {Proceedings of the International Conference on Mining Software Repositories}, pages = {168--179}, doi = {10.1109/MSR.2015.23}, note = {Acceptance Rate: 30% (32/106)}, organization = {IEEE}, }
SANER
Who should review my code? a file location-based code-reviewer recommendation approach for modern code review

Patanamon Thongtanunam, Chakkrit Tantithamthavorn, Raula Gaikovina Kula, Norihiro Yoshida, Hajimu Iida, and 1 more author

In Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering, 2015

Acceptance rate: 32% (46/144)

Abs Bib HTML PDF Slides

Software code review is an inspection of a code change by an independent third-party developer in order to identify and fix defects before an integration. Effectively performing code review can improve the overall software quality. In recent years, Modern Code Review (MCR), a lightweight and tool-based code inspection, has been widely adopted in both proprietary and open-source software systems. Finding appropriate code-reviewers in MCR is a necessary step of reviewing a code change. However, little research is known the difficulty of finding code-reviewers in a distributed software development and its impact on reviewing time. In this paper, we investigate the impact of reviews with code-reviewer assignment problem has on reviewing time. We find that reviews with code-reviewer assignment problem take 12 days longer to approve a code change. To help developers find appropriate code-reviewers, we propose RevFinder, a file location-based code-reviewer recommendation approach. We leverage a similarity of previously reviewed file path to recommend an appropriate code-reviewer. The intuition is that files that are located in similar file paths would be managed and reviewed by similar experienced code-reviewers. Through an empirical evaluation on a case study of 42,045 reviews of Android Open Source Project (AOSP), OpenStack, Qt and LibreOffice projects, we find that RevFinder accurately recommended 79% of reviews with a top 10 recommendation. RevFinder also correctly recommended the code-reviewers with a median rank of 4. The overall ranking of RevFinder is 3 times better than that of a baseline approach. We believe that RevFinder could be applied to MCR in order to help developers find appropriate code-reviewers and speed up the overall code review process.
@inproceedings{thongtanunam2015should, author = {Thongtanunam, Patanamon and Tantithamthavorn, Chakkrit and Kula, Raula Gaikovina and Yoshida, Norihiro and Iida, Hajimu and Matsumoto, Ken-ichi}, year = {2015}, booktitle = {Proceedings of the International Conference on Software Analysis, Evolution, and Reengineering}, pages = {141--150}, doi = {10.1109/SANER.2015.7081824}, note = {Acceptance rate: 32% (46/144)}, organization = {IEEE}, }

2014

IWESEP
Assessing MCR discussion usefulness using semantic similarity

Thai Pangsakulyanont, Patanamon Thongtanunam, Daniel Port, and Hajimu Iida

In Proceedings of the International Workshop on Empirical Software Engineering in Practice, 2014

Abs Bib HTML

Modern Code Review (MCR) is an informal practice whereby reviewers virtually discuss proposed changes by adding comments through a code review tool or mailing list. It has received much research attention due to its perceived cost- effectiveness and popularity with industrial and OSS projects. Recent studies indicate there is a positive relationship between the number of review comments and code quality. However, little research exists investigating how such discussion impacts software quality. The concern is that the informality of MCR encourages a focus on trivial, tangential, or unrelated issues. Indeed, we have observed that such comments are quite frequent and may even constitute the majority. We conjecture that an effective MCR actually depends on having a substantive quantity of comments that directly impact a proposed change (or are “useful”). To investigate this, a necessary first step requires distinguishing review comments that are useful to a proposed change from those that are not. For a large OSS projects such as our Qt case study, manual assessment of the over 72,000 comments is a daunting task. We propose to utilize semantic similarity as a practical, cost- efficient, and empirically assurable approach for assisting with the manual usefulness assessment of MCR comments. Our case- study results indicate that our approach can classify comments with an average F-measure score of 0.73 and reduce comment usefulness assessment effort by about 77%.
@inproceedings{pangsakulyanont2014assessing, title = {Assessing MCR discussion usefulness using semantic similarity}, author = {Pangsakulyanont, Thai and Thongtanunam, Patanamon and Port, Daniel and Iida, Hajimu}, year = {2014}, booktitle = {Proceedings of the International Workshop on Empirical Software Engineering in Practice}, pages = {49--54}, doi = {10.1109/IWESEP.2014.11}, organization = {IEEE}, }
CHASE
Improving code review effectiveness through reviewer recommendations

Patanamon Thongtanunam, Raula Gaikovina Kula, Ana Erika Camargo Cruz, Norihiro Yoshida, and Hajimu Iida

In Proceedings of the International Workshop on Cooperative and Human Aspects of Software Engineering, 2014

Abs Bib HTML Slides

Effectively performing code review increases the quality of software and reduces occurrence of defects. However, this requires reviewers with experiences and deep understandings of system code. Manual selection of such reviewers can be a costly and time-consuming task. To reduce this cost, we propose a reviewer recommendation algorithm determining file path similarity called FPS algorithm. Using three OSS projects as case studies, FPS algorithm was accurate up to 77.97%, which significantly outperformed the previous approach.
@inproceedings{thongtanunam2014improving, author = {Thongtanunam, Patanamon and Kula, Raula Gaikovina and Cruz, Ana Erika Camargo and Yoshida, Norihiro and Iida, Hajimu}, year = {2014}, booktitle = {Proceedings of the International Workshop on Cooperative and Human Aspects of Software Engineering}, pages = {119--122}, doi = {10.1145/2593702.2593705}, }

2013

RSS
Mining history of gamification towards finding expertise in question and answering communities: experience and practice with Stack Exchange

Patanamon Thongtanunam, Raula G Kula, Ana EC Cruz, Norihiro Yoshida, Kohei Ichikawa, and 1 more author

The Review of Socionetwork Strategies, 2013

Abs Bib HTML

Recently, an online Q&A tool has become an essential part of individual communities and organizations of experts on specific topics. Using the answers to questions about specific topics will help such communities work more efficiently in their fields. Currently, Q&A online communities are adopting gamification to engage users by granting awards to successful users. In this paper, we investigate how to mine award achievement histories to find expertise. We propose the use of sequence analysis and clustering techniques. Specifically, we study the history of Stack Exchange, a large Q&A community that employs gamification. To the best of our knowledge, this is the first study of using award achievement history to find expertise in Q&A communities.
@article{thongtanunam2013mining, title = {Mining history of gamification towards finding expertise in question and answering communities: experience and practice with Stack Exchange}, author = {Thongtanunam, Patanamon and Kula, Raula G and Cruz, Ana EC and Yoshida, Norihiro and Ichikawa, Kohei and Iida, Hajimu}, year = {2013}, journal = {The Review of Socionetwork Strategies}, publisher = {Springer}, volume = {7}, pages = {115--130}, doi = {10.1007/s12626-013-0038-0}, }