Twenty-one-year old Orevaoghene Ahia speaks good English and writes tons of code in python. Her mum, an Isoko woman bred in Nigeria’s Niger Delta area, is fluent in three languages but favours pidgin English for conversations with family, friends and customers. Though Ahia’s pidgin isn’t as intense as her mum’s waffi dialect, she regards it an indispensable feature of her formative years.
So when Kelechi Ogueji, 23, suggested a pidgin-to-English machine translation project, acceptance was easy. “It’s insane that most technologies we build these days are for very high-resource language speakers,” Ahia tells me.
From June to August this year, Ogueji and Ahia – colleagues at the Nigerian office of InstaDeep, an AI research and solutions firm – successfully developed a model for the translation. The project is published and was accepted – after peer review – at this year’s NeurIPS conference, the world’s largest gathering of artificial intelligence researchers and enthusiasts, which held last week.
Dubious visa standards by the Canadian embassy denied them the honour of being in Vancouver to hold the world’s attention. A shame given the objective and symbolic importance of this work in showing Africans’ competence to identify and solve local problems.
Braving the odds
To build an English-French translator, for example, you need a corpus – tons of sentence pairs for both languages. Goodbye : Au revoir. Comment ça va : How are you, etc.
But with little pidgin English used online in writing – there’s no Wikipedia Pidgin – finding data for a decent translation project proved to be the first major hindrance.
It would lead them to train an Unsupervised Neural Machine Translation (UNMT) model. Essentially, they created a pidgin-English catalogue of word pairs from scratch, scraping 56,695 pidgin sentences and 32,925 unique words from a couple of websites.
“It was challenging,” Ogueji confirms. He has been interested in working on languages since encountering Google Translate on his first android device in 2014. But this project was the first Natural Language Processing (NLP) project ever to be done on pidgin English by anyone.
It didn’t help that AI development is at an embryonic stage in Nigeria. Quality research data is scarce or too raw where available. Thanks to frameworks like Word2vec and Google’s transformer, their ambition had a solid foundation to build on.
Practically though, what’s this useful for?
Being an intersection of computer science, mathematics and statistics, lots of AI jargon is dizzying. Ogueji and Ahia’s six-page paper might give you a mild headache.
However, natural language processing is as immediately practical as modern complex technologies can be. It is the field of machine learning deployed in voice-to-text platforms like Otter, and voice assistants like Google Assistant, Siri and Alexa. Any field of life and business requiring communication can be improved using natural language processing technology.
Branding agencies apply NLP “to know what people are saying about a campaign or how they reacted to the campaign,” says Samantha Sam-Inimgba, an associate at Anakle, a digital marketing firm in Lagos.
“We use NLP to detect if people liked or hated your campaign. This is done via sentiment analysis.” The best tools of modern digital marketing combine age-old linguistic concepts and today’s advancing NLP techniques.
Because most sentiment analysis of digital audiences depend on high-resource language models, nuance could be lost when analyzing online chatter. Nigerian Twitter, for example, features lots of ‘Nigerian English’ with its distinct lexicology. The disqualification of Genevieve Nnaji’s Lionheart from the Oscars brought this subject of language to the fore recently.
More representation needed
But for visa delays, Oguejii and Ahia would have presented their work at two NeurIPS 2019 workshops. It would have provided them opportunities to network with other researchers from top universities and big tech firms.
Crucially, they’d have been in the room to ensure African opinions are represented in the AI advancements already re-shaping modern life.
“A lot of the top AI conferences in the world are being hosted in visa-restrictive countries,” Ogueji observes, pained. When those countries make it hard for Africans to attend conferences, it could stifle AI innovation and enthusiasm in developing economies.
Ahia has had two visa rejections or crippling delays from Canada in two years. Next year’s NeurIPS will be in Canada, too.
Opening future gates
But the disappointment will not puncture their drive. Self-tutoring themselves to AI competence has been a fulfilling journey since graduating in 2018. Ogueji studied Industrial Engineering at the University of Ibadan, while Ahia studied Marine sciences at the University of Lagos.
They’ve made the codes for their project available on Github for anyone to build more NLP work on pidgin English. It is yet to be deployed on a device or platform for practical translation.
Future researchers could use their data to create spellcheckers, chatbots and – wait for it – music lyrics generators, Ogueji says. This translation project was partly birthed after he and a friend thought to combine lyrics from Davido and Wizkid songs to auto-generate a new song for the artistes to collaborate on. Only to realize there was no existing corpus for pidgin English.
Currently, Google Translate doesn’t have a pidgin to English facility. With a supervised learning model backed by more parallel data, Ogueji is confident their work could help the search giant be more inclusive of the 75 million Nigerians who speak pidgin English. Pidgin English is the closest thing West Africa has to a unifying lingua franca.
This project cost them $2000, mostly spent training the PidginUNMT model via Amazon Web Services. Google, or some other industry giant, would have the warchest for the massive undertaking required to build a platform-ready translator.
Sam-Inimgba hopes this new research can help to bridge semantic gaps, improving our understanding of public conversation. She volunteers at AI Saturdays Lagos, a weekly meet-up of free hands-on machine learning lessons which Ogueji and Ahia, as well as other InstaDeep colleagues, help facilitate.
My Life In Tech: Tejumade Afonja is running the biggest AI classes you haven’t heard of