Zeerak Ahmed MDE ’18 runs Matnsaz, an initiative to better represent Urdu in technology. Growing on his master’s thesis work of building breakthrough Urdu keyboards for modern smartphones, Zeerak now runs a collaboration across continents and disciplines to build infrastructure for software developers across the world that want to support Urdu and other languages in the Arabic script.
In late 2019, they released Makhzan, an Urdu text corpus. A corpus of text is the fundamental building block used to train artificial intelligence upon which language processing capabilities are built. From autocorrect, to search, and to linguistic analysis, Makhzan will support a diverse set of use cases with a high-quality and free-to-use data source.
With the help of learnings from Makhzan, Zeerak is inching closer to a public beta of his Urdu keyboard. Recent articles in MIT Technology Review Pakistan, and Princeton Alumni Weekly go deeper into the technological and cultural implications of this new technology.
Credit: Michael Raspuzzi