Abstract

Merge conflicts often occur when developers concurrently change the same code artifacts. While state-of-practice unstructured merge tools (e.g git merge) try to automatically resolve merge conflicts via textual similarity, semistructured and structured merge tools try to go further by exploiting the syntactic structure and semantics of the involved artifacts. Although there is evidence that semistructured merge has significant advantages over unstructured merge, and that structured merge reports significantly less conflicts than unstructured merge, it is unknown how semistructured merge compares with structured merge. In an empirical study, we compare semistructured and structured merge by reproducing more than 40,000 merge scenarios from more than 500 projects. We assess how often the tools report different results. We also identify conflicts incorrectly reported by one tool but not by the other (false positives), and conflicts correctly reported by one tool but missed by the other (false negatives). Our results show that the tools differ on 24% of the scenarios with conflicts. Semistructured merge reports more false positives, whereas structured merge has more false negatives. Finally, we observe that adapting a semistructured merge tool to resolve a particular kind of conflict makes semistructured and structured merge even closer.

How many conflicts arise from the use of semistructured and structured merge?

There is a difference of 1.29% in the number of reported conflicts.

Merge scenarios had at least one conflict with semistructured merge on average 2.25% of the considered merge scenarios, with a standard deviation of 4.58% . Considering aggregated numbers of all projects of our sample, this corresponds to 2.31% of the considered merge scenarios. Besides that, merge scenarios had at least one conflict with structured merge on average 1.8% of the considered merge scenarios, with a standard deviation of 3.92% . Considering aggregated numbers of all projects of our sample, this corresponds to 1.87% of the considered merge scenarios.

Statistical significance, consider a confidence level of 0.95 (p-value = 0.05):

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  cf_SS$Semistructured and cf_ST$Structured
## V = 4013.5, p-value = 2.938e-13
## alternative hypothesis: true location shift is not equal to 0

Strength/magnitude of the statistical claim (effect size):

## 
## Cliff's Delta
## 
## delta estimate: 0.03353025 (negligible)
## 95 percent confidence interval:
##       lower       upper 
## -0.03204124  0.09881434

How often do semistructured and structured merge differ with respect to conflict occurrence?

By All Scenarios

The tools differ on average 0.52% of the considered merge scenarios, with a standard deviation of 2.06% . Considering aggregated numbers of all projects of our sample, this corresponds to 0.58% of the considered merge scenarios.

By Conflicting Scenarios

The tools differ on average 23.22% of the considered merge scenarios, with a standard deviation of 44.45% . Considering aggregated numbers of all projects of our sample, this corresponds to 23.67% of the considered merge scenarios.

When there is chance of conflicts in Files

The tools differ on average 4.09% of the considered merge scenarios, with a standard deviation of 13.26% . Considering aggregated numbers of all projects of our sample, this corresponds to 5.16% of the considered merge scenarios.

Boxplot

Which strategy reports fewer false positives?

Semistructured Merge Additional False Positives

By All Scenarios

On average 0.1% of the considered merge scenarios, had semistructured merge additional falses positives, with a standard deviation of 0.73% . Considering aggregated numbers of all projects of our sample, this corresponds to 0.09% of the considered merge scenarios.

Structured Merge Additional False Positives

By All Scenarios

On average 0.01% of the considered merge scenarios, had structured merge additional falses positives, with a standard deviation of 0.17% . Considering aggregated numbers of all projects of our sample, this corresponds to 0.01% of the considered merge scenarios.

Statistical significance, consider a confidence level of 0.95 (p-value = 0.05):

## Warning in wilcox.test.default(aFP_ST$Structured, aFP_SS$Semistructured, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(aFP_ST$Structured, aFP_SS$Semistructured, :
## cannot compute exact p-value with zeroes
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  aFP_ST$Structured and aFP_SS$Semistructured
## V = 70.5, p-value = 0.0003034
## alternative hypothesis: true location shift is not equal to 0

Strength/magnitude of the statistical claim (effect size):

## 
## Cliff's Delta
## 
## delta estimate: -0.09988568 (negligible)
## 95 percent confidence interval:
##       lower       upper 
## -0.12304102 -0.07662165

Which strategy has fewer false negatives?

Semistructured Merge Additional False Negatives

By All Scenarios

On average 0% of the considered merge scenarios, had semistructured merge additional falses negatives, with a standard deviation of 0.06% . Considering aggregated numbers of all projects of our sample, this corresponds to 0.01% of the considered merge scenarios.

Structured Merge Additional False Negatives

By All Scenarios

On average 0.05% of the considered merge scenarios, had structured merge additional falses negatives, with a standard deviation of 0.53% . Considering aggregated numbers of all projects of our sample, this corresponds to 0.08% of the considered merge scenarios.

Statistical significance, consider a confidence level of 0.95 (p-value = 0.05):

## Warning in wilcox.test.default(aFN_ST$Structured, aFN_SS$Semistructured, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(aFN_ST$Structured, aFN_SS$Semistructured, :
## cannot compute exact p-value with zeroes
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  aFN_ST$Structured and aFN_SS$Semistructured
## V = 214, p-value = 0.0006549
## alternative hypothesis: true location shift is not equal to 0

Strength/magnitude of the statistical claim (effect size):

## 
## Cliff's Delta
## 
## delta estimate: 0.0697888 (negligible)
## 95 percent confidence interval:
##      lower      upper 
## 0.04963143 0.08988934

Summary False Positives and False Negatives

Manual Analysis for Errored and Failing Builds

Does ignoring conflicts caused by changes to consecutive lines make the strategies more similar?

How many conflicts arise from the use of semistructured and structured merge? (excl. consecutive lines)

There is a difference of 5.95% in the number of reported conflicts.

Merge scenarios had at least one conflict with semistructured merge on average 2.19% of the considered merge scenarios, with a standard deviation of 4.52% . Considering aggregated numbers of all projects of our sample, this corresponds to 2.24% of the considered merge scenarios. Besides that, merge scenarios had at least one conflict with structured merge on average 1.8% of the considered merge scenarios, with a standard deviation of 3.92% . Considering aggregated numbers of all projects of our sample, this corresponds to 1.87% of the considered merge scenarios.

How often semistructured and structured merge differ with respect to conflict occurrence? (excl. consecutive lines)

By All Scenarios

The tools differ on average 0.47% of the considered merge scenarios, with a standard deviation of 1.99% . Considering aggregated numbers of all projects of our sample, this corresponds to 0.52% of the considered merge scenarios.

By Conflicting Scenarios

The tools differ on average 22.45% of the considered merge scenarios, with a standard deviation of 44.15% . Considering aggregated numbers of all projects of our sample, this corresponds to 22.37% of the considered merge scenarios.

When there is chance of conflicts in Files

The tools differ on average 3.54% of the considered merge scenarios, with a standard deviation of 11.51% . Considering aggregated numbers of all projects of our sample, this corresponds to 4.59% of the considered merge scenarios.

Boxplot