Branching and merging are common practices in collaborative software development. They increase developer productivity by fostering teamwork, allowing developers to independently contribute to a software project. Despite such benefits, these practices come at a cost--- the need to merge software and resolve merge conflicts, which often occur in practice. While modern merge techniques, such as 3-way and structured merge, can resolve textual conflicts automatically, they fail when the conflict arises not at the syntactic but at the semantic level. Detecting such semantic conflicts requires understanding the behavior of the software, which is beyond the capabilities of most existing merge tools. Although semantic merge tools have been proposed, they are usually based on heavyweight static analyses, or need explicit specifications of program behavior. In this work, we take a different route and propose SAM (SemAntic Merge), a semantic merge tool based on the automated generation of unit tests that are used as partial specifications of the changes to be merged, and drive the detection of unwanted behavior changes (conflicts) when merging software. To evaluate SAM’s feasibility for detecting conflicts, we perform an empirical study relying on a dataset of more than 80 pairs of changes integrated to common class elements (constructors, methods, and fields) from 51 merge scenarios. We also assess how the four unit-test generation tools used by SAM individually contribute to conflict identification: EvoSuite (the standard and the differential version), Randoop, and Randoop Clean, an extended version of Randoop proposed here. Additionally, we propose and assess the adoption of Testability Transformations, which are changes directly applied to the code under analysis aiming to increase its testability during test suite generation, and Serialization, which aims to support unit-test tools to generate tests that manipulate complex objects. Our results show that SAM best performs when combining only the tests generated by Differential EvoSuite and EvoSuite, and using the proposed Testability Transformations (nine detected conflicts out of 28). These results reinforce previous findings about the potential of using test-case generation to detect test conflicts as a method that is versatile and requires only limited deployment effort in practice.
The essence of SAM is to generate and execute tests when merge scenarios are performed. These tests are executed over the different commits of a merge scenario, and after interpreting their results, the tools report whether a semantic conflict is detected.
SAM receives as input a merge scenario and, based on the changes performed by developers, invokes unit test generation tools to generate test suites exploring the ongoing changes. Next, based on some heuristics, SAM checks whether test conflicts occur by comparing the test suites’ results against the different merge scenario commits. If a conflict is detected, the tool warns developers about its occurrence, informing the class and its element involved in the conflict. Figure 1 below shows how SAM works. Our current version of SAM is available on GitHub.
We present here our dataset of 85 mutually integrated changes’ pairs on class elements (constructors, methods, and fields) from 51 software merge scenarios mined from 31 different projects. In 28 changes, we can observe the occurrence of semantic conflicts (column Semantic Conflict). Using Regression Testing, we were able to automatically detect 9 of these conflicts. Figure 2 below shows the steps we adopt in this study.
Here, you can find the list of changes on the same declarations we analyzed in our study. Each change is represented as a row in the table below.
For the HikariConfig case, a semantic conflict occurs based on our ground truth. During our study, SAM reported a conflict for this case; however, the reported test was flaky. As a result, we consider this case as a false-positive. For further discussion about FPs, please, check our paper.
For additional details regarding our dataset, you can find further information here. We provide descriptions for each row of the table above, like whether the associated changes represent a conflict, the summary of changes performed by each parent commit, and when applicable, also a test case revealing the conflict.
For additional details regarding the set of test cases that detected semantic conflicts, please check this file. We inform the test cases of each test suite that detected any of the semantic conflicts.
As a final remark, regarding the general behavior changes detected in our study, please check this file.
Here you can find the links for the scripts and dataset we used to perform our study. Aiming to support replications, we provide our sample of merge scenarios as a dataset. This dataset is composed of build files required for our scripts for the generation of test suites by the unit test generation tools, and execution of these suites against the different versions of a merge scenario.
Below, we present in detail how our scripts can be used to replicate our study using our dataset or perform a new one using another sample.
We recommend the execution of the next steps when trying to replicate our study.