Pictures and videos are an integral part of any social network, but these media mediums also turn out to be the chief perpetrators of fake news and hate speech. To minimize the impact of such harmful media, companies like Facebook use automated tools.
In a recent blog post, Facebook has described a new machine learning-powered system that can identify text in videos and images, and transcribe the text into a machine-readable format.
Called Rosetta, this system goes one ahead of the traditional optical character recognition (OCR) technique as it makes efforts to understand the context of the image and text together.
The company has used a billion public images and videos from Facebook and Instagram to train its text recognition model. The overall process involves two steps of detecting a rectangular region that might contain text and then performing text recognition using a convolutional neural network (CNN).
The model isn’t just limited to English text; it also supports different languages and encodings, including Hindi and Arabic.
Rosetta has already been widely integrated with various products within Instagram and Facebook. The company says that it’s using the same to improve photo search, serve personalized content, and identify hate speech content in different languages.