Hash Matching Will Save Content Moderation, Faster Than AI Currently Can On Its Own

Explore how highly advanced yet lightweight hash matching technologies can come together with AI to provide a fully comprehensive solution.

AI and Hash Matching: Refocusing Content Moderation with a Dovetailed Solution

AI is commonly thought of as the preeminent solution to content moderation. While the direction the technology is taking is certainly promising, reports of its limitations have been a consistent point of conversation in tech circles. Currently, consensus from experts behind the scenes is that AI content moderation contains too many holes to be dubbed the all-encompassing solution some public-facing tech moguls tend to paint it as, suggesting it may very well be time to refocus how the tech space thinks about content moderation tools moving forward.

Included here:

How AI content moderation works
AI’s limitations
Hash matching for content moderation
Hash matching in action
Ideating a dovetailed solution

The problem with identifying new/unknown harmful content

When people talk about AI content moderation, they tend to only look at one specific case: Automated image recognition technology put to work assessing novel or newly posted harmful content, also commonly referred to as unknown content. Much of the conversation around whether AI will eventually prove to be a bullet-proof solution for moderating content is often hinged on this single perspective, completely leaving out a second, just as prevalent case: Known content, or previously posted content.

To leave the issue of identifying known harmful content out of the conversation is to miss an opportunity to dramatically change content moderation for the better—a change that can be enacted immediately.

While it might seem sexier to talk about a fully automated AI solution that can think and make decisions about unknown/newly reported content the same way a human does, it’s not necessarily the most practical solution for moderators, especially when it comes to identifying known/previously reported content. Not to mention, AI is a long way off from being able to perform in this manner. Even with shortcomings like false positives and programming bias, the Zuckerbergs of the world like to point to this approach to AI and content moderation as the forthcoming solution, seemingly ignoring the fact that powerful, robust, yet extremely lightweight hash matching solutions have already solved the problem of identifying known/previously reported content for moderators.

These ideas in mind, this article aims to suggest a dovetailed solution for content moderation: AI can be effectively supplemented by hash matching technologies, creating a two-pronged approach that solves for any and all known content while freeing up bandwidth for AI to handle unknown/newly reported content. By focusing first on known content using hash matching tools, moderators can effectively cut down their workload while leveraging AI for unknown content.

Inside AI content moderation

How it works

AI content moderation works by analyzing data patterns and generating algorithms, usually designed to work in accordance with a service provider's guidelines and policies. Using these algorithms, AI is able to make predictions, decisions, and ultimately flag or remove content it deems harmful.

For example, if your online service has a rule against a specific type of content, such as certain body parts or other types of explicit objects, it should be able to accurately determine whether a newly posted image contains these objects. AI can then flag and remove the content, or send it to a moderator for a second look and determine whether the image is truly harmful.

AI is still too primitive to solve content moderation alone

Understandably, the push for fully automated AI content moderation is a direct result of the sheer amount of harmful UGC that makes its way onto the web every day, but AI is just not powerful enough to handle it yet.

For example, in their 2022 Q3 transparency report, TikTok shows that of the 111 million videos removed, only 53 million were removed by AI, meaning that more than half of the flagged content was taken down manually. This suggests that half of the content posed a special circumstance calling for human review, or that the harmful content had slipped past the AI.

Similarly, Meta reports that 90% of the content they remove is found by AI before being brought to human attention, but their 2022 Q3 report shows that 414 million pieces of child sexual exploitation content were submitted for appeal. This means that regardless of whether the appealed content actually does or does not contain abuse, human moderators were forced to review the content after AI’s initial removal, slowing down workflows and potentially causing trauma.

Both of these examples suggest that even with topnotch AI solutions in place, it’s clear that AI is not quite what its clout suggests it is. At the moment, AI seems unable to provide moderators with a tool that can effectively manage the fever-inducing amount of content inundating their servers. Platforms like Meta and TikTok continually report that they are always working to improve their technology, but with no real fix in sight, it’s apparent that platforms may need to rethink their moderation technologies.

AI’s major limitations

AI’s major limitations include false positives, false negatives, human-programmed bias, and a lack of understanding context. These can lead to the removal of acceptable content or the failure to remove unacceptable content, cause undue discrimination, or accidentally promote harmful/explicit content.

1. False positives: Often due to a lack of accuracy or vaguely inputted guidelines, false positives produce a trigger-happy problem where content that would never be flagged by a human moderator ultimately is.

2. False negatives: When AI isn’t trained to be sensitive enough to harmful content, false negatives can occur. Significantly detrimental to the fight against harmful content and online safety in general, false negatives fail to flag and remove content that needs to be taken down. When this happens, harmful content is able to spread across the web, causing harm to users as well as those depicted in certain types of abuse-related content.

3. Programming bias: Created by humans, it should be obvious that AI can–and often will–reflect the biases of the data used to train the model. If programmers aren’t careful, this can lead to unfair and highly discriminatory execution of content moderation.

4. Context: Perhaps the most difficult to solve of these four, AI may not be trained on data that allows it to accurately interpret the context of the content it is analyzing. A common example goes like this: If your platform has a rule against nipples (specifically female), the AI may not be able to accurately distinguish between male and female, leading to a false positive/negative while forwarding essentially useless feedback to the overseeing moderator.

5. Adversarial attacks: AI can also be used against itself. Adversarial attacks happen when one AI system is used to manipulate another. In terms of content moderation, this can result in AI producing a false negative or false positive.

Using powerful, lightweight hash matching to solve known harmful content

While automated AI content moderation continues to improve, there is another type of content moderation tool whose technical kinks have already been worked out. Hash matching, a method of visual fingerprinting, currently exists as a fully automated solution for identifying known content.

To provide a quick, high-level explanation, platforms can use hash matching technologies in the following way:

Hash matching relies on a database of known images that have already been assigned a hash value. When a ‘new’ image or video is uploaded to a platform, it is assigned a hash which is then compared to the database of known hashes. If a match is detected, the content is flagged for review and (depending on the platform's own internal moderation processes) removed. This allows for quick and efficient removal of known harmful content, without relying solely on human moderators to identify and review every individual piece of content. In fact, this process can oftentimes be completely automated, with little to no intervention needed from a moderator.

The benefits of focusing on known/previously reported harmful content

Specifically, new innovations in local descriptor hash matching have made exceptional strides in detecting known harmful images and videos, working to create an extremely powerful yet lightweight solution for detecting known abuse videos in less than a second. Platforms that are connected to a comprehensive harmful content database will be able to accurately identify any harmful content–-child sexual abuse material (CSAM), terrorist and violent extremist content (TVEC) and non-consensual intimate imagery (NCII) included–that gets reported, drastically reducing workloads and traumatic exposure for content moderation teams.

Here lies the point. With known harmful content moderation handled by hash technologies, an AI-human moderator combo has more bandwidth to work with when addressing the remaining percentages of novel/unknown content.

Not only this, but a dovetailed AI-hash matching solution would also save moderators from what we all need to remember is extremely traumatic work. Every day, moderators are exposed to thousands of pieces of harmful content, which can often lead to post-traumatic psychological conditions. Effectively integrating a joint AI-hash matching solution could open up the space to build better labor standards for this industry while optimizing the moderation process itself.

NCMEC’s hash matching solution

For example, the Videntifer team has seen hash matching video identification solutions work extremely well in conjunction with large CSAM databases like the National Center for Missing and Exploited Children’s (NCMEC), providing a thorough and effective solution for identifying known harmful videos and cutting moderators’ (in NCMEC’s case, analysts) work in half.

As illustrated below, NCMEC has gone through a few iterations of hash matching technology that have drastically impacted their ability to sort unknown/new and known/previously posted content.

NCMEC had initially adopted a basic approach to video and image identification, a method called strict hashing. Using this solution, they were able to identify exact duplicates, reducing the total identified unique videos to 12.9 million. This was a big reduction, but still too much content to put on manual review. But with advanced hash matching, NCMEC was able to more effectively automate known CSAM review, reducing the count of unique videos to 5.1 million.

Thinking future forward

Using NCMEC as a prime example, the idea is essentially the same for online platforms: Reduce the amount of known/previously reported harmful content a moderator has to sift through, and you free up bandwidth for AI moderation solutions to tackle whatever percentage of novel content is coming through. Not only do you reduce the workload, you reduce the negative impact of when AI makes a mistake. The combination of these two methods can help improve the speed and accuracy of content moderation, and better protect users from exposure to harmful or prohibited content.

Thinking future forward, AI and hash matching technology are valuable tools in the ongoing effort to improve content moderation and ensure a safer online experience for all. By continuing to innovate and refine these technologies, we can work towards a more effective and responsible approach to content moderation in the digital age.