Stefan Scherer, PhD, CTO at Embodied, Inc.
Date & Time
October, 2020, 9:00 AM PST
Moxie is a novel companion for children designed to help promote social, emotional and cognitive development through play-based learning. With Moxie, children can engage in meaningful play, every day, with content informed by the best practices in child development and early childhood education. As an animate companion, Moxie needs to perceive, understand, and react to the physical world that surrounds it and the human users that interact with it. Moxie leverages multimodal information fusion on the edge (i.e., on the robot device itself) to build and track an accurate representation of its surroundings and its users. Using multimodal information, Moxie localizes and prioritizes input from engaged users, recognizes faces, objects, and locations, analyzes facial expressions and voice of the users to assess their affect, mood, and level of engagement, and understands the user’s intent, desires, and needs holistically. To enable an animate companion to gain rapport and trust with its human users, it is crucial that the sense-react loop of the system is as tight and fast as possible. To reduce the time between perceived input and produced output, Moxie leverages a combination of computer vision algorithms, lightweight neural networks, and natural language processing on board. Edge computing allows us to be in control of this sense-react loop and only rely on cloud computing resources when needed. Lastly, we entrust Moxie with supporting and engaging with one of the most precious and vulnerable demographics, our children. Hence, data security is of the utmost importance to us. Edge computing of video data allows us to ensure the required privacy and security for our users as raw video data never leaves the robot.
Stefan Scherer is the CTO of Embodied, Inc. and formerly Research Assistant Professor at the University of Southern California, where he lead research projects funded by NSF, NIH, and the Army Research Lab. Stefan received his PhD at Ulm University in Germany with distinction in 2011. His research aims to automatically identify, characterize, model, and synthesize individuals’ multimodal verbal and nonverbal behavior within human-machine as well as human-human interaction. His interdisciplinary work is focused on machine learning, multimodal signal processing, and affective computing with applications focussing on healthcare and education. His research was featured in the Economist, the Atlantic, Wired, and the Guardian and was awarded a number of best paper awards in renowned international conferences.