Artificial personal assistants such as Amazon’s Alexa allow users to speak into their phone, their computer or even a tiny black tower and play music, make purchases or check the weather.
Alexa is hosted inside of Amazon Echo, a “smart speaker” designed to bring voice-recognition technology and personal assistants into the home, according to Amazon’s website.
Shiv Vitaladevuni, senior manager of machine learning at Amazon Alexa, spoke about “Building Alexa” during a presentation Friday to about 50 students and faculty members in Boston University’s Photonics Center.
Vitaladevuni, who previously studied and worked in computer integration and biology, presented the successes and challenges the Amazon team encountered while developing Alexa and other Echo technology over the past three years.
“I still find it fascinating,” Vitaladevuni said. “Every day, there’s something new to learn.”
The presentation was part of the Center for Information and Systems Engineering’s seminar series, which regularly invites prominent figures in the engineering industry to speak on different topics.
Alexa is developed with “homegrown” programming at Amazon’s Cambridge office, which focuses on machine learning and modeling to make Alexa conversational and adaptive to different environments.
Alexa requires a “wake word” to begin operating, a simple-seeming requirement that Vitaladevuni described as having several ongoing challenges.
Background noise, such as another person talking or sounds from another device, make it difficult for Alexa to recognize the “wake word” and any following commands. Pronunciation differences, such as accents or slang words, also challenge programmers adapting Alexa to different countries or adding modifications and programs.
“As we add more features and more effects, [development] becomes a more complex problem,” Vitaladevuni said.
Vitaladevuni’s team, using feedback from users and tests, programmed a feature to help Alexa recognize a person’s unique “acoustic footprint,” or speech pattern, to filter out other sounds and focus on direct commands.
Moving forward, Vitaladevuni said, his team plans to improve their ability to detect errors, clarify them and apply what they learn in tests and studies to production.
When students asked if Alexa came with any kind of manual, Vitaladevuni said that programmers now work toward voice-recognition technology that doesn’t require manuals, believing that Alexa “should adapt to us, not us to her.”
The event’s student host was Andrew Cutler, a Ph.D candidate in information and data sciences in the College of Engineering. He said he hopes that with the presentation, students can now see how their studies in the classroom transfer to problem-solving in the workroom.
“That was valuable to me,” Cutler said, “to see different resources and techniques in solving problems in academia and industry.”
Alexa does not use artificial intelligence programs commonly taught in classes, Cutler said. Developers opt instead for a kind of neural network, with “simpler models that they have spent years fine-tuning.”
Researchers and developers work in the long-term to create personal assistants with “encyclopedic” knowledge, or ones that process information like humans.
“As the technology progresses and techniques improve, [programs like Alexa] will be increasingly more of an agent in the real world,” said Joseph DellaMorte, a junior in ENG.
DellaMorte said that from his experience and studies, he believes personal assistants and related technology will be designed to “do very specific tasks really well, such as driving a car, and nothing else.”
Feedback is crucial in voice-recognition technology, Vitaladevuni said. User input helps Amazon develop new modifications for Alexa, whether that’s improvements to noise-cancellation technology or add-ons like “MovieBot,” which allows the user to talk to Alexa about movies and find new recommendations.
According to Amazon’s website, MovieBot uses information from IMDb, which is owned by Amazon, to help users “stay in the know about new movies and find out what’s good.”
“It’s an experiment to see if you can make Alexa truly conversational,” Vitaladevuni said, “like a person.”