Facebook’s explanations for how its heavily promoted AI algorithms failed to flag the New Zealand video took an intriguing turn this past Wednesday when the company acknowledged that its video moderation AI was based on a naïve deep learning binary classifier approach that lacked the necessary training data to have stood any chance at flagging the video. Through its comments, Facebook reminds us just how poor of a fit binary classifiers are for content moderation.
Facebook’s concession this past Wednesday that it uses a binary classifier approach to flag violent videos was confirmation of what critics of the company’s AI content moderation practices had long suspected: it has absolutely no understanding of how the real world functions.
For all its talented technologists that recognize the near-absolute impact of training data upon the results of deep learning models, Facebook somehow never thought twice about the methodological impacts of trying to train a binary classifier on an exceptionally broad and extraordinarily vague topic like terroristic violence for which the boundaries are particularly ill-defined and where there is very little training data.
Binary classifiers that accept as input an image, video, audio clip or passage of text and return a simple true/false score as to whether the content is allowable or whether it should be removed as a violation of a platform’s terms of service have become the go-to tool of the growing AI content moderation movement at companies like Facebook.
Such classifiers are particularly well-suited to the needs of social media companies in which speed and computational cost are paramount and accuracy is of little concern. For social platforms, they need only assure governments they are using AI to moderate posts. They don’t actually need those AI solutions to work in any measurable way since unlike copyright violations, they face no liabilities or consequences for publishing terroristic material and in fact actually profit monetarily from the sharing of such material.
Binary classifiers are also extremely easy to construct, requiring only a collection of positive and negative examples and a bit of compute time to generate.
Classifiers quickly break down, however, when there is insufficient training data to fully express the range of inputs that should generate a positive response. A classifier designed to recognize all images of dogs will struggle if built from scratch using a collection of 10 images of adult golden retrievers standing at attention in grass at midday in the same location in a single dog park.
Machine learning’s popularity stems from its ability to learn the patterns of large amounts of training data without the need for humans to point out what to look for. Instead of a human expert attempting to describe what makes a given email spam, a machine learning algorithm can simply take a large pile of spam messages and non-spam messages and come up with its own ruleset for determining spam likelihood.
Humans are uniquely able to recognize patterns without requiring large example datasets. Unlike machines, they can draw upon their vast world knowledge and powerful pattern extraction abilities to identify patterns even from very small datasets. Current generation deep learning algorithms require very large training datasets to identify even rudimentary patterns. Transfer learning can reduce the required amount of training data, but still requires sufficient examples to fully map out the boundaries of the classification space.
This creates a problem when it comes to employing deep learning models as classifiers for content moderation. When human moderators struggle to agree on whether a given piece of content is a violation or not and when the boundaries of acceptable and unacceptable content are fluid and constantly changing, it is nearly impossible to generate a sufficiently diverse and broad enough training dataset to fully encompass everything the AI model would be expected to identify.
In such cases, where there is insufficient training data to build a model of the phenomenon of interest, but large amounts of highly diverse training data for its building blocks, a far more robust approach is to employ a collection of models that assess those underlying components and combine them into a composite risk score.
For example, while Facebook correctly notes that there are relatively few first-person videos of firearm-based attacks using military-style equipment on a place of worship, there are vast volumes of examples for each of those underlying pieces.
Rather than trying to build a single model that can take any video on earth and flag it as being a terror video or not, the far more robust approach is to run each video through a set of classifiers that assign discrete labels like the presence of a firearm, the presence of a military-style assault weapon, the location of that weapon in a firing position, the weapon being held as a person approaches a sensitive location like a school or place of worship, and so on.
This even allows the model to adapt to characteristics of each user and respond to realtime changes in the informational ecosystem.
For example, in the aftermath of an attack on a place of worship, such a content filtering algorithm could instantly adjust the impact of the video location on its risk score. While the depiction of a weapon in the immediate vicinity of a place of worship might always generate an elevated risk score, in the aftermath of a shooting at a religious facility the algorithm might immediately flag all uploaded videos featuring a weapon near a religious building as requiring review. Rebuilding a large deep learning model of such magnitude to incorporate such changes is not something that can be done in realtime, whereas a label-based model can easily adjust the influence of each characteristic second by second.
Facebook did not respond to a request for comment.
Putting this all together, Facebook’s failure to identify the New Zealand video reminds us that the point-and-click binary classifier model that has become the go-to approach of computer vision in the modern deep learning era is not necessarily appropriate for the broad and vague topical spaces of content moderation.
In the end, rather than treating everything as a binary classification problem, we need to recognize that some problems require more complex deep learning solutions.