2021-01: Andres Milioto Defended His PhD Thesis

Summary

Over the last few years, robots have been slowly making their way into our everyday lives. From robotic vacuum cleaners picking up after us already working in our homes, to the fleets of robo-taxis and self-driving vehicles lurking on the horizon, all of these robots are designed to operate in conjunction with, and in an environment designed for us, humans. This means that unlike traditional robots working in industrial settings where the world is designed around them, these \emph{mobile robots} need to acquire an accurate understanding of the surroundings in order to operate safely, and reliably. We call this type of knowledge about the surroundings of the robot \emph{semantic scene understanding}. This understanding serves as the first layer of interpretation of the robot’s raw sensor data and provides other tasks with useful and complete information about the status of the surroundings. These tasks include the avoidance of obstacles, the localization of the robot in the world, the mapping of an unknown environment for later use, the planning of trajectories, and the manipulation of objects in the scene, among others.

In this thesis, we focus on semantic scene understanding for mobile robots. As their mobility usually requires these robots to be powered by batteries, the key characteristics they require from perception algorithms are to be computationally, as well as energy~\emph{efficient}. Efficient means that the approach can exploit all the information available to it to run fast enough for the robot’s online operation, both in power- as well as compute-constrained embedded computers. We approach this goal through three different avenues. First, in all of the algorithms presented in this thesis, we exploit background knowledge about the task we are trying to solve to make our algorithms fast to execute and at the same time, more accurate.
Second, we instruct the approaches to exploit peculiarities of the particular sensor used in each application in order to make the processing more efficient.
Finally, we present a software infrastructure that serves as an example of how to implement said scene understanding approaches on real robots, exploiting commercially available hardware accelerators for the task, and allowing for scalability. Because of this, every method presented in this thesis is capable of running faster than the frame rate of the sensor, both when using cameras or laser sensors.

All parts of this thesis have been published in proceedings of international conferences or as journal articles, undergoing a thorough peer-reviewing process. Furthermore, the work presented in this thesis resulted in the publication of a large-scale dataset and benchmark for the community to develop, share, and compare their semantic scene understanding approaches, as well as four open-source libraries for this task, using multiple sensor modalities.