Merge conflicts often occur when developers concurrently change the same code artifacts. While state-of-practice unstructured merge tools (e.g git merge) try to automatically resolve merge conflicts via textual similarity, semistructured and structured merge tools try to go further by exploiting the syntactic structure and semantics of the involved artifacts. Although there is evidence that semistructured merge has significant advantages over unstructured merge, and that structured merge reports significantly less conflicts than unstructured merge, it is unknown how semistructured merge compares with structured merge. In an empirical study, we compare semistructured and structured merge by reproducing more than 40,000 merge scenarios from more than 500 projects. We assess how often the tools report different results. We also identify conflicts incorrectly reported by one tool but not by the other (false positives), and conflicts correctly reported by one tool but missed by the other (false negatives). Our results show that the tools differ on 24% of the scenarios with conflicts. Semistructured merge reports more false positives, whereas structured merge has more false negatives. Finally, we observe that adapting a semistructured merge tool to resolve a particular kind of conflict makes semistructured and structured merge even closer.
There is a difference of 1.29% in the number of reported conflicts.
Merge scenarios had at least one conflict with semistructured merge on average 2.25% of the considered merge scenarios, with a standard deviation of 4.58% . Considering aggregated numbers of all projects of our sample, this corresponds to 2.31% of the considered merge scenarios. Besides that, merge scenarios had at least one conflict with structured merge on average 1.8% of the considered merge scenarios, with a standard deviation of 3.92% . Considering aggregated numbers of all projects of our sample, this corresponds to 1.87% of the considered merge scenarios.
Statistical significance, consider a confidence level of 0.95 (p-value = 0.05):
##
## Wilcoxon signed rank test with continuity correction
##
## data: cf_SS$Semistructured and cf_ST$Structured
## V = 4013.5, p-value = 2.938e-13
## alternative hypothesis: true location shift is not equal to 0
Strength/magnitude of the statistical claim (effect size):
##
## Cliff's Delta
##
## delta estimate: 0.03353025 (negligible)
## 95 percent confidence interval:
## lower upper
## -0.03204124 0.09881434
The tools differ on average 0.52% of the considered merge scenarios, with a standard deviation of 2.06% . Considering aggregated numbers of all projects of our sample, this corresponds to 0.58% of the considered merge scenarios.
The tools differ on average 23.22% of the considered merge scenarios, with a standard deviation of 44.45% . Considering aggregated numbers of all projects of our sample, this corresponds to 23.67% of the considered merge scenarios.
The tools differ on average 4.09% of the considered merge scenarios, with a standard deviation of 13.26% . Considering aggregated numbers of all projects of our sample, this corresponds to 5.16% of the considered merge scenarios.
On average 0.1% of the considered merge scenarios, had semistructured merge additional falses positives, with a standard deviation of 0.73% . Considering aggregated numbers of all projects of our sample, this corresponds to 0.09% of the considered merge scenarios.
On average 0.01% of the considered merge scenarios, had structured merge additional falses positives, with a standard deviation of 0.17% . Considering aggregated numbers of all projects of our sample, this corresponds to 0.01% of the considered merge scenarios.
Statistical significance, consider a confidence level of 0.95 (p-value = 0.05):
## Warning in wilcox.test.default(aFP_ST$Structured, aFP_SS$Semistructured, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(aFP_ST$Structured, aFP_SS$Semistructured, :
## cannot compute exact p-value with zeroes
##
## Wilcoxon signed rank test with continuity correction
##
## data: aFP_ST$Structured and aFP_SS$Semistructured
## V = 70.5, p-value = 0.0003034
## alternative hypothesis: true location shift is not equal to 0
Strength/magnitude of the statistical claim (effect size):
##
## Cliff's Delta
##
## delta estimate: -0.09988568 (negligible)
## 95 percent confidence interval:
## lower upper
## -0.12304102 -0.07662165
On average 0% of the considered merge scenarios, had semistructured merge additional falses negatives, with a standard deviation of 0.06% . Considering aggregated numbers of all projects of our sample, this corresponds to 0.01% of the considered merge scenarios.
On average 0.05% of the considered merge scenarios, had structured merge additional falses negatives, with a standard deviation of 0.53% . Considering aggregated numbers of all projects of our sample, this corresponds to 0.08% of the considered merge scenarios.
Statistical significance, consider a confidence level of 0.95 (p-value = 0.05):
## Warning in wilcox.test.default(aFN_ST$Structured, aFN_SS$Semistructured, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(aFN_ST$Structured, aFN_SS$Semistructured, :
## cannot compute exact p-value with zeroes
##
## Wilcoxon signed rank test with continuity correction
##
## data: aFN_ST$Structured and aFN_SS$Semistructured
## V = 214, p-value = 0.0006549
## alternative hypothesis: true location shift is not equal to 0
Strength/magnitude of the statistical claim (effect size):
##
## Cliff's Delta
##
## delta estimate: 0.0697888 (negligible)
## 95 percent confidence interval:
## lower upper
## 0.04963143 0.08988934
There is a difference of 5.95% in the number of reported conflicts.
Merge scenarios had at least one conflict with semistructured merge on average 2.19% of the considered merge scenarios, with a standard deviation of 4.52% . Considering aggregated numbers of all projects of our sample, this corresponds to 2.24% of the considered merge scenarios. Besides that, merge scenarios had at least one conflict with structured merge on average 1.8% of the considered merge scenarios, with a standard deviation of 3.92% . Considering aggregated numbers of all projects of our sample, this corresponds to 1.87% of the considered merge scenarios.
The tools differ on average 0.47% of the considered merge scenarios, with a standard deviation of 1.99% . Considering aggregated numbers of all projects of our sample, this corresponds to 0.52% of the considered merge scenarios.
The tools differ on average 22.45% of the considered merge scenarios, with a standard deviation of 44.15% . Considering aggregated numbers of all projects of our sample, this corresponds to 22.37% of the considered merge scenarios.
The tools differ on average 3.54% of the considered merge scenarios, with a standard deviation of 11.51% . Considering aggregated numbers of all projects of our sample, this corresponds to 4.59% of the considered merge scenarios.