You might have heard of GitHub Copilot AI that Microsoft recently added to GitHub in collaboration with OpenAI. For starters, it basically helps you write better code and does so by suggesting a few lines of code or even the entire functions at once.
Now, you might be thinking, “Woah! That’s great. This will help me become a better programmer,” but did you know that developers already hate it? Because it essentially copies the code from other repositories and suggests it to you. Because it was trained that way.
GitHub Copilot: Copyright Infringement?
The short answer is, yes, it does. Developers tried reaching out to GitHub support over email, and they did say that the entire public GitHub code was used to train the model. Copilot was seen copying and suggesting the popular Carmack’s inverse square root function.
The bigger the dataset, the more the AI learns. In this case, i.e., famous inverse square root function, this was probably the best-case scenario, so Copilot recommended it. The results would’ve been way different if there were thousands if not millions of famous inverse square root functions.
But is it really a big deal? Let’s cut to the chase here. The major reason why developers are criticizing it is this: Microsoft being the company that owns GitHub, now has access to all the repositories. Training an AI model using all the public repositories and, in the end, charging a subscription fee for others to use it will benefit Microsoft, but what do those who contributed the code get in return? Nothing.
So, Microsoft took advantage of the GPL license, which, for starters, is a license for open-source software which permits the use of software in other projects without any copyright restrictions. Now that this is happening, it almost feels like Microsoft acquired GitHub solely for this purpose, but that’s just my view about it.
Laws, Laws, and Laws
And the open-source community cannot complain, nor can they sue Microsoft. The reason? Because there are absolutely no rules or regulations on how tech giants plan to use open-source repositories. Even if people decide to sue Microsoft, that’d mean a new set of rules might be imposed on how open-source software is used. This would, in turn, question the whole point of open-source being open-source.
Many people think that GitHub Copilot doesn’t work the way it’s advertised. A large portion of what Copilot outputs is already full of copyright/license violations, even without extensions.
GitHub Copilot: What Do People Think?
Another question that arises is, “If it’s suggesting huge chunks of code that are functions that the training model collected, then what is computer-generated?” This was asked by one of the Twitter users, to which he answered in his own thread. “If I create some database that collects loose functions from publicly available repositories (even with permissive licenses) all around the world, and then create some software that would paste this function if referenced or matched – would it be computer-generated, or not? I would want to hear what some judge will say to my testimony that “It was computer generated” if someone sued me for copyright infringement (rightfully so, IMO). I don’t believe that the judge will say, “Ah, if it was computer-generated, then it’s fine.”
To which he added, “Now how is copilot different in essence from such a database? We can argue that code generated by copilot is highly modified and easily differentiable from the original piece – that you hopefully cannot tell on which fragment it was based at all. But the problem is that – as mentioned earlier – sometimes it’s not. Sometimes it is literally the same, even down to the famous comments. And the worst thing is we event don’t know when using how much verbatim it is. I know one thing for sure – I would not want to be sued by the oracle because they found some of their (open source but licensed) code in my repository that was “generated” by GitHub Copilot.”
Who’s To Be Blamed Here?
Based on the situation, it’s really tough to answer this question. Yes, Microsoft is using the repos for their own good, but there are no laws based on which people can sue contributors. This is why most are moving their code from GitHub.
It’s copyright infringement for sure, but at the end of the day, no one can do anything about it. What do you think? Let us know your thoughts and opinions about the same in the comments section below.