India is witnessing a rapid growth in regional language users. A recent Google-KPMG study states that 70 percent Indians find local language digital content more reliable. There are approximately 500 million Indian language users online. This figure will significantly have a huge impact on the nature of content presented on digital platforms.
According to a report by RedSeer Consulting, Indian language internet users are growing at 13 percent annually, against a mere one percent growth in English-speaking internet users. With the advent of super affordable internet connectivity and wider network coverage, the Next Billion of the world are slowly coming online, and this underlines the need to provide them with a platform to talk and share, crucially in their native language. Naturally, with such a task at hand, streamlining recommendations and improvising the platform becomes a challenge.
With most users coming online for the first time, and with there being no precedent to their usage or to their language, the nature of data and usage patterns being formed are varied, diverse, and immense. This leaves the regional language social media apps with a huge amount of data to process, and to identify how internet behavior varies within different ethnic groups, and how first-time users differ from long-time users. The data science team of these companies further has the task of finding the finer elements, like the tone of a statement in any vernacular language, processing data from video consumption, and so on.
There are multiple aspects that are being considered by regional language social media apps while catering to Indian internet users. For one, the apps provide a social platform for everyone to share content ranging from humor and devotion to politics and regional updates. The idea is for individuals to follow each other, along with personalities on the platform. This not only gives these personalities a greater connect with their local followers, but also gives him/her a ground check on local sentiments.
Second, the apps allow individuals from various corners, who have moved away from their motherland because of work or other opportunities, to be connected to updates from their own social circle(s). For regional social media apps like ShareChat, where the company sees over 800,000 new posts across all languages and regions, computing this staggering amount of information becomes difficult, and hence it goes to the data science team.
With such tremendous amount of data to compute, access and decode, the two of the most pioneering technologies of today’s generation are being used – Artificial Intelligence (AI) and Machine Learning (ML). The key for these regional language social media apps is to train the advanced AI and ML algorithms efficiently, in order to implement multiple things.
Let’s dig deeper at everything that goes underneath the blanket in terms of artificial intelligence and machine learning. The eventual goal of regional social media apps is to provide a platform for all to share and communicate in the mother tongue, and that is the biggest challenge in implementing AI. While many recommended models of ML and AI are already available on the internet, the usage of these technologies is the real challenge.
ShareChat’s data science work spans across several ML paradigms like Computer Vision, Natural Language Processing, and Recommender Systems. All these paradigms have evolved over decade of research and ShareChat’s data scale and diversity presents several new challenges in applying these paradigms efficiently.
The right use of technology also depends on the scale of data, which is different in every social media. For instance, a method like matrix factorization, which is commonly used in several recommendation engines doesn’t works well at ShareChat.
‘Machine Learning’ is the obvious type or form of technology that the regional language social media apps use to learn about our users and their content. The content processing pipeline is the trickiest part of data-based technology at these companies.
At ShareChat content processing pipeline includes large scale text processing across multiple regional languages, OCR detection and large-scale object, theme and NSFW post detection from images and videos. Computer vision algorithms and contextual text processing enables the social media platforms to understand each language in its native form, and hence recommend efficiently.
It is evidently not easy to formulate the booming regional language social media platforms. The sheer volume of diversity in our nation’s content gives the data team and its technologies plenty of work to do. This is just a glimpse of the difficulties that are encountered and how these platforms are constantly striving to overcome them and make their product significantly better.
The article is written by Ankush Sachdeva, CEO ShareChat.