Biases in AI: Algorithms v.s. Data

Posted by Divya Susarla on 9/11/17

With A.I. being such a buzzword recently, so many misconceptions are being built about the technology. From articles talking about how Facebook had to shut down an AI after the system created its language, to the widely known Microsoft’s bot that became racist, A.I. is developing an illusion of being uncontrollable.

I think it is important for us to take a step back and truly acknowledge why these situations occur. I recently had a great chat with my team here at, after this article was circulating about how Google is explaining how A.I. is biased against women and minorities. While reading the article, I couldn’t help but hear the echo of my General Assembly instructors drilling into our minds “Your models are only as good as your data.” This key machine learning truth/philosophy, stuck with me in my read of the article, as they pointed out three types of biases the algorithms have. I kept asking myself, are they stating that it’s the machine learning algorithms that are biased or is it the data that already carries inherent biases?

The algorithms learn from the data they are fed, whether at training time, or when interacting with users. Algorithms are not sentient, an important distinction to make given the illusions surrounding A.I. The bias lies in the data if the algorithm is incorrectly correlating doctor with men because of the stock images it has been fed, isn’t the problem that there aren’t enough stock images with female doctors, labeled correctly, that the algorithms learn from?

If the bias lies within the data, then fix the data right? But is it that simple? As the article mentions, in the case of the shoe, Google asked users to draw a shoe and users drew men’s shoes. This bias came straight from the users to skew the data. My team and I spent some time discussing approaches to fix this, with the obvious one being to get more representative data. My colleague also mentioned, that having the algorithms interact with a wider range of users so that information gets appropriately updated could be a key component, but then look what happened to the Microsoft bot when interacting in the twitter world. Is the sad reality that our biased data is representative?

By the fact that we acknowledge our data is biased, I’d like to be optimistic and think that biased data is not accurately representing our world. As we go about developing A.I., we should be holding ourselves to a higher standard than simply saying “A.I. can accidentally perpetuate the biases held by its makers.” It's up to the makers to ensure statistically accurate data is being gathered, and it is up to all of us to make sure that biased data is not the reality of our world.

Topics: Artificial Intelligence