code quality in open source projects
Image: Depositphotos

One can imagine that contributions to open source projects would be evaluated on the quality of code above anything else. However, researchers have found quite the opposite!

In a paper titled, “Does Code Quality Affect Pull Request Acceptance?, submitted to “Information and Software Technology” journal; researchers tried to determine whether code quality issues such as — duplicated code, long methods, large class, code style violation, etc. — affect the chances of a pull request getting accepted by a project maintainer.

The researchers Vili Nikkola, Nyyti Saarimäki, Valentina Lenarduzzi, and Davide Taibi analyzed a total of 28 Java open-source projects, which included 4.7m code quality issues in 36,000 pull requests.

Majority of them were accepted

Out of these 28 projects, 22 were managed by the Apache Software Foundation. The other six were chosen from GitHub’s list of Trending Java repositories for the purpose of this study.

A total of 19,293, that is, 53.08% of the overall pull requests were accepted, and 17,051 (46.92%) of the requests were rejected.

It is to be noted that the acceptance rate differed for different projects. The Apache Phoenix project showed an acceptance rate of 9.85% whereas the Apache Helix project appeared to be less selective by accepting a whopping 90.85% of pull requests.

Reputation matters more than code correctness

The researchers deployed various machine learning techniques to evaluate the code. The PMD software analysis tool suggested that code quality didn’t really affect the chances of acceptance of a pull request at all.

But reputation matters a lot and being a respected community figure appeared to have more weight than quality of code and its correctness.

In fact, other factors such as the “importance of the feature delivered might be more important than code quality in terms of pull request acceptance.”

This explains why we have more bugs

The study addresses a harsh truth among developers which doesn’t get acknowledged quite often — that as long as the job gets done, quality of code often takes a back seat.

However, it is surprising that developers are okay with displaying their terrible code to the community.

But a major part of the blame falls mostly on maintainers as they tend to remain associated with the project for a longer time than the contributors. And they should be more concerned about the code quality of anything they deem worthy of acceptance.

Also Read: Open Source Wins: Microsoft Is Bringing exFAT File System To Linux Kernel