Many artificial intelligence (AI) technologies rely on the human voice, such as Siri, Amazon Alexa, and Google Assistant, which all use a virtual intelligent agent that you can have a conversation with. However, these are all based on spoken language technologies and are not accessible to those who cannot speak or hear, such as people who are deaf or hard of hearing.
Bowen Shi, a PhD candidate at TTIC, had previously worked on projects in computer vision as well as in spoken language before starting at TTIC in 2016. He completed his Bachelor of Science at Shanghai Jiao Tong University and his Master of Science at ENSTA Paris, University of Paris 6.
Professor Karen Livescu, Shi’s advisor, is a principal investigator along with Professor Greg Shakhnarovich, TTIC, and Professor Diane Brentari, the University of Chicago, on a sign language recognition and translation project. Shi’s previous experiences, his motivation in making today’s rapidly progressing language technologies more accessible, and the opportunity to collaborate with Professor Livescu and other students on this project inspired him to focus his research on sign language.
“My PhD project was to make today’s AI technologies for spoken languages to work for sign language too,” Shi said. Shi defended his thesis, “Toward American Sign Language Processing in the Real World: Data, Tasks, and Methods,” on October 17, 2022.
“Many Deaf people cannot speak, but they use sign languages,” Shi shared during his thesis defense. “According to the World Federation of the Deaf, there are about 70 million Deaf people worldwide and over 200 different sign languages.”
Sign language uses hand gestures, facial expressions, body movements, different body postures, and many types of visual signals to commit a meaning to communicate something, Shi said. American Sign Language (ASL) is different from English and has different grammar and lexicon.
One special component of sign language is fingerspelling, which involves spelling out words letter by letter, and makes up for 12-35% of ASL, Shi said. Many important content words such as names, organizations, and words to emphasize are signed using fingerspelling.
For Shi’s PhD project, he and his group collected large-scale, real-world (naturally occurring) data sets to set up benchmarks to develop sign language processing techniques. Their collaborative data collection has been released publicly, known as the Chicago Fingerspelling in the Wild Data Sets (ChicagoFSWild).
“Sign language recognition and translation in the real-world environment is still very challenging, but eventually, we will have sign language technologies as good as today’s spoken language technologies accessible to the Deaf community,” Shi said.
The collaborative culture and open research atmosphere have been Shi’s favorite aspects of the TTIC community.
“I had the chance to talk with people about my research projects,” Shi said. “They will give me suggestions that I don’t have the opportunity to get elsewhere, so I can get a lot of different perspectives that are helpful to integrate into my own project. I can also learn about others’ research and give feedback that could help them.”
Shi has started a position in Meta AI research (previously known as FAIR), where he will continue to do research on language.
“It’s an area that is growing and there are a lot of opportunities,” Shi said. “I’m also very psyched about these different projects in sign language.”