We have recently open-sourced Bonnetal, an easy-to-use deep-learning training and deployment pipeline to do a suite of perception tasks, that we have developed for our robots’ perception systems.
Bonnetal can pre-train popular CNN backbones on ImageNet for transfer learning (popular model trained weights are downloaded by default from our server so the learning never happens from scratch) and it has fast decoders for real-time semantic segmentation. We have more applications in the internal pipeline that we will be open-sourcing within the framework as well, such as object detection, instance segmentation, keypoint/feature extraction, and more.
The key features of Bonnetal are:
- The training interface is easy to use, even for a novice in machine learning,
- The library of models for transfer learning requires significantly less training data and time for a new task and dataset, exploiting the knowledge that is already condensed in the pre-trained weights about low-level geometry and texture,
- All architectures can be used with our C++ library, which also has a ROS wrapper so that you don’t have to code at all, and
- All of the supported architectures are tested using NVIDIA’s TensorRT so that you can get that extra juice out of your Jetson or GPU, including fast inference tricks such as INT8 quantization and calibration (vs. standard, slower, floating point 32).
This video (https://youtu.be/C7rUycC5Rts) shows a person-vs-background segmentation network using a MobilenetsV2 architecture with a small Atrous Spatial Pyramid pooling module, running quantized to INT8 for fast inference, achieving 200FPS at VGA resolution on a single GPU.
Access to the code in our Lab’s GitHub: https://github.com/PRBonn/bonnetal