Abstract

When collaborating, developers often create and change software artifacts without being fully aware of other team members’ work. While such independence is essential for increasing development productivity, it might also result in conflicts when integrating developers code contributions. To better understand some of these conflicts— the ones revealed by failures when building integrated code— we investigate their frequency, structure, and adopted resolution patterns in 451 open-source Java projects. To detect such build conflicts, we select merge scenarios from git repositories, parse the Travis logs generated when building the commits, and check whether the logged build error messages are related to the merged changes. We find and classify 239 build conflicts and their resolution patterns. Most build conflicts are caused by missing declarations removed or renamed by one developer but referenced by another developer. Conflicts caused by renaming are often resolved by updating the missing reference, whereas removed declarations are often reintroduced. Most fix commits are authored by one of the contributors involved in the merge scenario. We also detect and analyze build failures caused by immediate post integration changes, which are often performed with the aim of fixing merge conflicts but end up leading to build issues. Based on our catalogue of build conflict causes, awareness tools could alert developers about the risk of conflict situations. Program repair tools could benefit from our catalogue of build conflict resolution patterns to automatically fix conflicts; we illustrate that with a proof of concept implementation of a tool that recommends fixes for conflicts.

Authors

How frequently do build conflicts occur?

We present here the 239 Build Conflicts, we identify in our study. These build conflicts are from 65 merge scenarios mined from 37 different projects. Here you can find the list of build conflicts presented in the table bellow.

If you want to access the .travis.yml configuration file associated with a build, you can reach it through the merge commit link associated with each conflict in our sample.

What are the structures of the changes that cause build conflicts?

We present here 6 build conflict cause we identified in this study. Some causes can be split into sub-categories, like Unavailable Symbol Class, Method, and Variables. The causes cover not only static semantic problems, but also static analysis performed after the compilation phase during the build process.

Which resolution patterns are adopted to fix build conflicts?

Here we present 17 Resolution Patterns adopted to fix the build conflicts of our study. These patterns show that fixes do not always preserve all contributions from the merge scenario, but also discard some of them.

How frequently do broken builds occur caused by post integration changes?

We present here 485 Broken Builds caused by post integration changes performed after the merge scenario by the integrator, we identify in our study. These broken builds are from 51 merge scenarios mined from 17 different projects. The causes reported here are the same reported for build conflict, but the motivation behind them are different. Here you can find the list of broken builds caused by post integration changes presented in the table bellow.

Sample

Here you can find the list of projects of our sample. For each project, we present some metrics collected from GitHub or locally by git.

Violin Plots

For each column of the previous table, we present below the distribution of the values through violin plots.

Travis Adoption

Number of Commits

Commit Authors

Size (LOC)

Number of Forks

Number of Stars

Number of Stars

Commit Frequency

Regarding the commit frequency in these studies, the figure below presents the distribution of projects' commit history of our sample. On average, projects present around 2,000 commits (sd = 4159), while 42 projects present less than 100 commits. Regarding the activity in these projects in the last six and twelve months, second and third boxplots, respectively, some projects received consistent and continuous contributions during these intervals, although most projects have received a few or sporadic commits.

Uma imagem impressionante

Study Replication

Here you can find the links for the scripts we used to perform our study. Below, we present in detail how our scripts can be used to replicate our study or perform a new one using another sample. Additional information is available in the project page on GitHub.

We recommend the execution of the next steps when trying to replicate our study.

  1. Setting up the project - Once the project is cloned, you must fulfill the with your information the properties file.
    • First, you must inform your login and password of GitHub account; this information is necessary to create the forks that will be created by the scripts.
    • During the analysis, some build process on Travis will require to deploy data in the associated GitHub tag. For that, it is necessary to create and inform an OAuth GitHub token. Here you can find how to create a new OAuth GitHub token.
    • The next property is PathGumTree, that is the location where you saved Gumtree . We use an improved version of this tool, which can be found here. Download and unzip it the file. Go to the directory bin and inform the current local path.
  2. Getting a sample - Next, you must inform the list of projects to be analyzed using the file projectsList. These projects will be downloaded and temporarily saved. If you want to analyze the projects we used in our study, it can be found here. Otherwise, you can use a new list of projects.
  3. Running the study - Finally, locate the MainAnalysisProjects and run the command: "ruby MainAnalysisProjects.rb". After the execution, the folder FinalResults will be created, grouping the generated CSV files with the results.

Automatic Repair Tool Prototype

Here you can find the link for our prototype of automatic repair tool for build conflicts. Additional information is available in the project page on GitHub.

Our current implementation can also fix build conflicts caused by Unimplemented Method, Duplicated Declaration, and Unavailable Symbol Method.

We recommend the execution of the next steps when trying to replicate our study.

  1. Once the project is cloned, you must fulfill the properties file with your information.
    • First, you must inform your login and password of your GitHub account.
    • The next property is PathGumTree, that is the location where you saved Gumtree. We use an improved version of this tool, which can be found here. Download and unzip it the file. Go to the directory bin and inform the current local path.
    • Next, you must inform the local path of the project. The HEAD of the project must point the commit with the broken build caused by a build conflict.
  2. Finally, locate the file main.rb, and run it by executing the command: "ruby main.rb". If the tool detects a conflict that can be fixed, it will report and ask whether you want to fix it. If yes, the tool will apply the required changes and try to compile the code. Next, it will ask you again whether you want to save the previous changes by creating a new commit.