This project was carried out as part of the TechLabs “Digital Shaper Program” in Aachen (Winter Term 2020)
Bird’s Eye View Environment Perception: The upcoming transformation in the automotive industry with the development of autonomous vehicles (AV) has changed the very purpose of automobiles. Cars are no longer merely the means of transportation but are gearing towards becoming multi-purpose platforms driven by software.
Autonomous vehicles typically refer to transport systems that move without the need for human intervention. This is achieved by the AV’s capability to sense its environment and operate independently.
A human passenger is not required to take control of the vehicle at any time or even be present in the vehicle. The development of AVs has been defined in terms of levels of automation. The levels for autonomous driving range from Level 0 (no automation) up to Level 5 (full vehicle autonomy).
Environment Perception is a critical feature for autonomous vehicles since it provides the driver with key details about the driving environment, such as the drivable areas with the distances, velocities, forecasts of future states and obstacle detection. The vehicle should detect and classify possible stationary and movable objects in its vicinity in a robust manner.
AVs are equipped with cameras and sensors at different positions to support localization. Localization refers to the estimation of the current pose of the vehicle and is important to accurately interpret sensor measurements.
Our project aims to estimate the birds eye view perception of an AV for safe and efficient path planning. To tackle this problem, we propose an end-to-end deep learning framework for LIDAR-based 2D image coordinate frame in bird’s eye view (BEV). Our method takes camera images of the vehicle’s environment as input and produces a trained neural network that efficiently estimates the location of static objects in the vehicle’s path.
We started off by procuring the dataset containing the camera images of the field tests performed by the Formula Students team in RWTH Aachen University. These images contained three camera angle shots-left, front and right; each shot at different positions in space. These were provided to us in a .bag file format, a standard format in robotics. We also were given LIDAR captures, each at different positions in space that were used to normalize these frames. The following steps were performed in order to have suitable inputs and labels required to train the neural network. The images provided represent the outputs at every step being executed.
- Evaluate the transform given by (rotation matrix R x position vector P) + Translational vector T.
- The 3D LIDAR points were then transformed to the 2D image coordinate frame of the camera images. We did not transform all the LIDAR points but only the points that lie close to the ground.
- Transform the LIDAR points into BEV of the LIDAR points.
- With the results from 2 and 3, we were able to find matches in the LIDAR points and compute how the pixels in the camera image had to be warped to match the BEV perspective. This was done for each camera shot(left, front and right).
- Next, we merged all the images.
- This merged result was provided as input to the neural network.
- Computing the label for the neural network: This is achieved by creating a black and white bitmap with the positions of the cones marked in the BEV. To create this, we filtered all points that are in a certain range above the ground(20–40cm). This is the range where the cones will most likely be.
- Using the merged image from Step 5 , we applied an RGB filter to filter out just the blue and yellow colour of the cones and later used this as a mask to further improve the label from Step 7.
- As a result, we generated an image with the distinct cones. Moreover, Step 7 shows the label with white points that depict where the cones are supposed to be and we then used the RGB filter from Step 8 to colour the points for the neural network to learn and accurately identify the position of the cones on the either side of the track.
- Finally, the inputs and labels to train the neural network were ready.
The figures above illustrate the BEV transformation performed. In the course of this project we experienced a number of setbacks. Initially we started off with applying semantic segmentation using DeepLab available on GitHub. More specifically, the goal of semantic segmentation is to label each image pixel with a corresponding class of the data being represented. Vehicles are required to be equipped with perception abilities to understand their environment. A neural network architecture such as that provided by DeepLab is used to output a final segmentation map using feature mapping. However, in our case we had to apply semantic segmentation to identify the cones located in the track observed in the Formula image dataset. The label for the cones wasn’t present in the Pascal training dataset and hence it did not not give us the results we expected.
Besides that, we faced issues with the data provided by the Formula 1 team, in particular the pictures, which led to our BEV projection to be a mildly skewed. Another problem was the amount of data that was available as an input to the neural network. A major part of the images could not be used because of a calibration issue — the camera on the vehicle moved down a few inches during the test drive, making those images unusable for this project.
In addition to that, the Neural Network is not fully optimized yet for it to be delivering satisfactory results. However, we believe that putting more time and effort into the Neural Network architecture would eventually result in better predictions even given the limited amount of data.
Aside from the contentwise problems we faced, we did not have an optimal team structure by the end of the project phase. We started off with 6 team members and reduced to 4 after a couple of weeks. Considering the scope of the project, more team members would have worked more efficiently.
Conclusion and Outlook
Considering the recent developments in the field of automative vehicles, this project definitely gave us a valuable insight into modern day technological progress. All in all, our team is quite content with the outcome of this project. In the course of our project we learned to deal with the problems that arise from working with data that is not ideal. Especially the results of the BEV transformation were pretty satisfactory and set a major milestone. Even though the final results do not entirely meet our expectations, there is definitely space for improvement. The future scope of this project includes developing the neural network architecture to make better predictions even with a limited input dataset. This would result in improved Bird’s Eye View Perception of the autonomous vehicle.
TechLabs Aachen e.V. reserves the right not to be responsible for the topicality, correctness, completeness or quality of the information provided. All references are made to the best of the authors’ knowledge and belief. If, contrary to expectation, a violation of copyright law should occur, please contact email@example.com so that the corresponding item can be removed.