The multiplier problem of Open Source
If the algorithm of an open source software turns out to be biased for any reason, the multiplier effect that it can deliver is unfathomable
What do the programmers who earn six-figure salaries at the top tech companies do?
You would be forgiven for assuming that they are inventing and creating new code that fundamentally alters the fabric of the web.
Most of them spend their time copy-pasting and then customising open-source code or code found on StackOverflow.
When you point this out many coders say; “anybody can copy and paste, but not everyone knows which code to copy and paste.” It is hard to make code written by different people to work. Those who have this gift are geniuses.
Creating something from scratch is a time-consuming affair. Tech companies have to deliver quarterly growth and the only way to do that is through shortcuts. Open source code offers the best shortcuts. This is even more true for startups.
Those same programmers who are tired of this uninspiring work, spend late nights creating open-source projects. Something fresh, new, that addresses a very specific functional issue. More importantly, makes them feel alive.
Last year in December, the extent to which companies depend on open source code and the shamelessness with which these “innovators” use this code was laid bare.
In the spirit of Open Source Software, a developer by the name of Ceki Gülcü created a Java-based logging utility called Apache Log4j. Log files store what is happening within certain systems so that if an error occurs it is possible to figure out what went wrong.
When a utility is useful many people adopt it. It comes down to the person who wrote it, to maintain it. Often there are several bugs that get discovered over time and they need to be fixed as they get discovered.
Source: XKCD
Log4J was written in 2001 and that developer has been maintaining it since - free of cost. But, last week a zero-day vulnerability was discovered which allows arbitrary code execution.
Source: LBP
The list of services with Internet-facing infrastructure that is vulnerable to a critical zero-day vulnerability in the open source Log4j logging utility is immense and reads like a who’s who of the biggest names on the Internet, including Apple, Amazon, Cloudflare, Steam, Tesla, Twitter, and Baidu.
Source: ars Technica
Today, the same copy-paste coding is being put to use to create AI products.
While Open Source is great for the coding community and it has undoubtedly allowed the “innovators” to “innovate”. The problem arises out of the fact that most of the code is often reused without much thought. The coder only goes so far as to make sure that it works. Developers are often under tremendous time pressure. They simply do not have the time to investigate the thinking behind the algorithm and the problems that can arise as a result of it.
Now, let us suppose that I used an open-source AI algorithm that has a fundamental bias against a particular race of people built into it.
This is not because of malice, it is just how a developer thought about solving a problem.
This bias gets amplified because many developers are going to use the piece of code. Say, this code was to be used in the design of products meant for use by the government. What are the implications?
On May 18, thousands of software developers in China woke up to find that their open-source code hosted on Gitee, a state-backed Chinese competitor to the international code repository platform GitHub, had been locked and hidden from public view.
Later that day, Gitee released a statement explaining that the locked code was being manually reviewed, as all open-source code would need to be before being published from then on. The company “didn’t have a choice,” it wrote. Gitee didn’t respond when MIT Technology Review asked why it had made the change, but it is widely assumed that the Chinese government had imposed yet another bit of heavy-handed censorship.
Source: MIT Technology Review
While it may seem heavy-handed to have withdrawn access to various open-source projects, there is another side. Any western news outlet would see this as censorship.
I think the move might just be to ensure that the open-source problem does not get out of hand. There are few details on the nature of these projects or the kinds of algorithms that they were developing but in all likelihood, Chinese are being Chinese. They know that there is a potential problem of runaway amplification with open source and the government wants to cut that off at the source.
If an open-source algorithm is developed to harbour a bias against a particular country, what implications can this have? We all have seen how platforms like Facebook have been used by state actors to manipulate people. The algorithms can be used to manipulate people on a national scale.
There is a need to lend more thought to this and to also understand the implication of using the code that makes its way into open source projects.
A state actor, aware of how heavily a particular open source project is being used can anonymously add code. That code can create systematic bias. It can also lead to software functioning in nefarious ways for users in a particular nation.
The contributors are often aplenty and anonymous. This has benefitted the open-source community a great deal and helped many of these projects grow immensely but these same projects can just as easily be weaponised.
Let's Unplug