Solar panel soiling detection using deep neural networks

16 min readJul 21, 2022

This project was carried out as part of the TechLabs “Digital Shaper Program” in Aachen (Winter Term 2021/2022)

Introduction

In recent years, there has been a significant increase in reliance on renewable energy. According to the report from centre for climate and energy solutions[1], renewables account for 20% of total energy production. After hydro and wind energy, solar energy is the third-largest contributor. While these panels are designed to be as efficient as possible, problems caused by soiling and other environmental factors reduce their efficiency.

Soiling occurs when particles such as snow, dirt, sand, or other dirt particles adhere to the mirrors, reducing the power generation of the solar system by at least 3–4 percent despite regular cleaning. According to the DLR Institute[2] for Solar Research, this loss will increase to 4–7 percent by 2023. As we build solar farms with MW and GW capacities, this loss becomes more pronounced. While there are methods for reducing such losses, we will focus on the predictive maintenance approach today. Predictive maintenance aids in the detection of soiling on solar panels. After knowing this information, one can decide whether or not to clean them. The latter part can also be automated with the help of a mechatronics system.

According to the NREL soiling report[3], for a system that accumulates soil that blocks approximately 2% of sunlight over the course of a year, having one annual cleaning would maintain the loss of around 1.4 percent while having two cleanings per year could reduce the average loss to 1.3 percent. Using a machine-learning algorithm to detect soiling levels can help make the process easier and we can thus manage large solar farms without requiring too much manual human intervention. Maintenance improves the uptime and reliability of these systems.

Solar Panel Soiling Density around the world

Goal

The goal of this project is to develop a machine learning model for efficient solar panel soiling identification. Various methods from basic to advanced were systematically explored and were built one on top of another. This project ultimately serves as a small part of the bigger picture of predictive solar panel cleaning and maintenance.

Methodology

This project addresses the segmentation of soiling on solar panels using both traditional computer vision as well as modern deep learning approaches. The tasks to be solved can be split into two stages.

Detect solar panel and draw a bounding box
Segment the pixels inside the bounding box into soiling and solar panel

Traditional computer vision approach

The advantage of traditional approaches is, that you have clear visibility of how the algorithm works under the hood. For the first task, the Canny edge detector and Hough transform were used to find the bounding box of the solar panels. Here for simplicity reasons, lines were detected and intersections were found which ended up in a normal or skewed rectangular bounding box. One can also use Hough transform to find rectangles directly but it comes at a computational cost, due to the way the algorithm works.

For the second task, three approaches were attempted to solve the segmentation task at hand.

A simple condition

Since the current image only contains solar panel regions, a few hundred solar pixels can be extracted and the RGB colour range can be inferred, and an if-else condition can be used to classify what is not solar pixels. If the pixel value is within the acceptable blue range, then it is a solar panel pixel else it is a soil pixel. But will this simple condition work? Is the solution truly that simple?. The answer is NO. Solar panels not only contain blue pixels but white coloured lines run on their surface which disturbs our colour model now. Now we need two conditions. Later daylight variations, camera quality, etc come into the picture, and this if-else condition becomes a complex mess. Let us move on to the next approach. Each approach helped in building the foundation for what is to come in the end.

K-means

The K-means algorithm finds clusters in the RGB colour space of the image. The number of clusters is defined by the analyst. For this example, the image below is clustered into two regions, one for soil and the other for solar pixels.

But the k-means does not always yield such nice results. Outlier values heavily affect the cluster shapes and for each image, we have to mention the number of clusters. What will happen if an image contains brown soil and panel pixels but 3 clusters were given. The algorithm nicely finds two shades of either the brown or blue solar panel pixels and makes up for 3 clusters. Thus if we are to look for each image and give the number of clusters we might as well say if it is soiled or not. Some workaround can be found to automatically decide the number of clusters by shifting to an algorithm known as mean shift. But this is skipped here as we intend to develop a model using less computational resources during run time and capable of completing the segmentation of hundreds of such images in a reasonable time with good accuracy.

Histogram based Bayesian Classifier

Building on the first attempt, it is understood that we need a condition and some previous knowledge to classify a pixel as a solar panel or soiling. Based on the second attempt it is found that RGB space of images is a good way to store and utilise prior knowledge. Thus here Bayesian algorithm is used to classify the pixels probabilistically. Bayes theorem can be understood as a probability of probabilities. Some key terms are explained below.

Prior — How much we believe at the beginning can a pixel belong to each class, a probabilistic measure (example 0.5 to solar panel and 0.5 to brown soil). Values are decided depending on the number and type of classes. The sum should add up to one.
Likelihood — Upon seeing a pixel, how much can we say it belongs to a particular class (here we have a model of what solar panel pixels look like)
Normalisation (denominator) — Used to normalise amongst many classes
Posterior — How much does the pixel actually belong to a class

To build the histogram for Likelihood calculation, manually the clean solar panel pixels from images with varying lighting conditions were scraped and based on these pixels the RGB model was set up. For the prior value, initially equal distribution for each class was given, then later all combinations were explored to find the most suitable one.

This is the result of such a Bayesian classifier. The output is a probabilistic value of the pixels belonging either strongly to the background which is a solar panel or the foreground (brown soiling). Here two-class classifier is used.

The problem with this approach was that a very thin layer of soiling cannot be efficiently identified. We need a probability threshold value based on which we can assign a pixel confidently to a particular class. This introduces another parameter to be tuned. Hence this method is often used as a precursor to other more sophisticated methods such as graph cuts etc.

Thus considering the requirements such as bounding box detection, prior knowledge modelling, computational costs, run time, and algorithmic complexity we move on to the Deep learning-based method finally. Executing all the above during run time becomes intolerable, but we can use these methods to build a dataset once with which we can train the deep learning model to recognise soiling on solar panels

Deep learning approach

Deep learning is currently growing at a rapid pace with the available computational power, data, and network architectures. For the current task, we initially had a lot of options as mentioned below.

Image classification, Objects are detected and only the names of the classes are obtained as output
Object Detection, Object detection performs the task of detecting the classes present in the input data and also localises the objects giving a bounding box around the objects
Image Segmentation, In image segmentation pixel wise classification is performed and a cluster of different objects are obtained as output
Semantic Segmentation, where all the clusters of the same class are classified under the same name and colour
Instance Segmentation, where the number of instances of every object occurring in the image are also obtained

For the task of segmentation, we use a special kind of neural network called Convolutional neural networks. In this project, we are using a pre-trained model, which means it has been trained on millions of images with a lot of different classes. We can use the features extracted by such a network for this task.

When it comes to neural networks, you only get what and how much you show the network. Image segmentation tasks are achieved with what is called an encoder-decoder architecture. The encoder part produces a low-resolution feature where only the necessary information is encoded and the decoded part upsamples compressed presentation to original size. Skip connections traveling from encoder to decoder carry forward the high-resolution information which is helpful when decoding.

Dataset

This project uses the Deep eye solar dataset which contains 45754 images of soiled and clean solar panels with information such as timestamp, percentage power loss, and irradiance level.

Cleaning and preparation

In the received data set every image named in the following format, Solar_day_Month _date_hour_minute_second_year_L_percentage loss_I_Irradiancelevel.jpg, our main task was to retrieve the important information from this image name and classify them based on the dust it contains. After classifying original images, we renamed them, for ease of handling purposes.

Labelling

The neural network is trained using ground truth data. Ground truth data is the data that contains information about all the objects present in input videos or images. The process of creation of this ground truth data is called labelling. There are multiple ways of labelling a dataset depending on the required output from the Neural Network. As we have chosen to use semantic segmentation as the process to generate output, pixel-wise labelling of the input data is necessary.

The most common approach for generating ground truth data involves annotating the images by hand using software tools like labelMe, labelImg, CVAT, etc. This is generally a time-consuming process and generating labels for 45000 images would require a lot of human effort and time. We, therefore, opted for traditional machine learning algorithms like K-means filtering and histogram-based classification which makes use of Bayesian algorithms. This pipeline gave us direct output in the form of categorical data (every pixel is associated with a number of the corresponding class) This helped us to automate the process of labelling the ground truth data in an extremely time-effective way.

Image Augmentation

The ability of the neural network to produce accurate outputs from the new data passed inside it every time depends highly on the data that it has been trained on. One of the major obstacles is the availability of this data used for training. Datasets are generally monotonous in terms of the perspective, sizes, and colour combinations of the objects present in them. For example, solar panels can be of different sizes and the photos taken of these panels can vary in the angle of the shot, the distance of the panel from the camera, etc. Therefore the model trained on the monotonous data is accurate only on that specific data. This is an example of model overfitting. It ends up performing poorly during inference when it faces new data with increased variety.

As we can see from the picture, if there is not enough training data, the simulated model will be more flexible. This will produce a large error in our testing data. When we have enough training data, our model will be more matched to the actual model.

So how can we overcome this problem without having to gather more and more data? The most common and feasible way by which we can get more data is through data augmentation. The methods we use can be divided into two categories, one is the random variation of colour eg. brightness, contrast, gamma, etc. and the other is the random geometric transformation of image eg. rotation, cropping, shear, zoom, etc. The affine (geometrical transformations) are applied to both original images and the labeled categorical images. Whereas, the colour transformations are only applied to the original images as applying them to the labeled categorical images changes the pixel values leading to the change in the information of classes they represent. Tensor-flow provides us with functions that can be directly used to generate augmented data. The types of transformation used during this project were, random_flip_left_right, random_rotation, random_brightness, random_contrast, random_shear, random_gamma. Two augmentation policies were chosen at random and applied to every image passed into the pipeline for data augmentation.

Setting up the network

After the dataset was prepared by performing various operations, the next task was to feed it to the neural network. There are 2 broad ways of creating a neural network. One, building a neural network from scratch or two, using a pre-trained network. The accuracy of the neural network depends on the weights and biases. Starting from random weights and biases requires large iterations of training to get the model to the desired accuracy. Instead, it is more practical to use a pre-trained network with optimised values for weights, biases, and other hyper-parameters. This helps the network to reach greater accuracy with a substantially lesser number of iterations. MobileNetV2 [4] pre-trained network was used for this project.

As discussed earlier we aim to produce pixel-wise classified images as an output from the network. Neural networks decrease the size of the image to detect features (downsampling). Therefore to get a segmented image as an output of the same size we need to upsample the image. This is done by using the pre-trained network in a reversed fashion to get the downsampled image back to its original size. This is called the encoder-decoder architecture.

During the process of downsampling, some data is lost which leads to problems in producing high-quality output data. In order to preserve this quality, some information is stored during the downsampling process and is used while performing the upsampling. This is the basic idea behind the U-Net architecture. The data which is stored and fed back into the decoder part of the neural network are called skip connections. The figure below shows a schematic of the U-Net Architecture [5].

Undersampling

Upon analysis of the dataset, we discovered that all images are in the same location and pose. Since soiling happens gradually over weeks, the dataset covers the gradual build-up of this phenomenon. This might lead to overfitting even with a single run through the dataset. Also, the dataset has an imbalanced class distribution of clean and soiling images. There were nearly 20000 clean images and 8000 brown soiled images. Hence we under-sampled the dataset to as low as 200 images per class to start with.

No pretrained network output — No pre-trained network output

The pre-trained network output before training already detects some nice features when compared to a completely untrained network.

Hyper parameter tuning, training, and validation

Since we started with a pre-trained network, even the total of 800 images of classes clean, brown, soiling, and background gave good results. The samples, epochs, and other hyper-parameters were tuned. A large number of samples and high epochs lead to overfitting. The complete training was just for 10 min and the results of solar panel identifications are close to what’s required. The images below show the true and predicted labels in the validation dataset.

Evaluation

Even though this seems like a good prediction if we use the mean intersection over union (MIoU) metric over all the classes it gives only does not increase more than 14 % throughout the training process. The MIoU is the metric for evaluating semantic image segmentation tasks, where the overlap In cases where more training was done, the validation accuracy was decreasing which means the model is overfitting (memorising the training examples). This is due to fact that the soiled pixels are very low when compared to clean panel and background pixels.

Testing and inferencing

While it may do good on the validation dataset, we want generalisation on unseen and random solar panel images. The image below shows the prediction on a random solar panel completely outside of the dataset.

Project results

Outcome

The main aim of the project was to develop an artificially intelligent system that can detect the amount of soiling deposited on the solar panels in large solar farms. The trained network was tested on various videos which included dirt of various types. The neural network performed well in detecting the solar panels in videos with varied characteristics. The accuracy of detecting panels was around 90%. But the network was only able to detect small regions of soil on the panels.

Problems

As we used CNN, our output was largely dependent on the quality of training images. The quality of images was not appropriate hence detection of mild soiling layers was not possible.
Class imbalance is another major challenge we faced during the segmentation task. We have a lot of pixels for classifying the solar panels and background but there are very few pixels comparatively for actual brown and white soiled regions. When training, the network rarely sees soil on the images and tries to learn it.
In particular, with White dust, our network failed to distinguish between white junction points and the dust itself as shown in figure
The Second hurdle was our dataset had only selected types of dust types like white and brown. Since there can be other reasons for soiling like slow or bird shit which also has to be considered for training the network in a more comprehensive manner.
Our dataset contained a single solar panel, with fixed coordinates of location and angle which reduced the variations inside it. Even after augmentation, there is no inherent diversity which is not good for Neural Network training.

Conclusion

As we went traditional computer vision approach to the modern deep learning approach we learned when and where, and how to use the above approaches. In traditional computer vision attempts, we hand-crafted conditions /algorithms for the detection of soiling. With the problem of feature complexity, we cannot use this approach. Unlike detecting squares and circles on images soiling cannot be detected due to its randomness in its occurrences of shapes patterns and density etc. Hence to address this challenge we have moved on to a more sophisticated CNN approach. As in CNN, these features can be learned. When it comes to Convolutional Neural Networks, the better the dataset the better the training process and inference from the network. So we used all the knowledge that we had collected from the traditional Computer Vision approaches and built an automatic labelling pipeline to label the solar panel soiled images so that we can use them later for training. So we use a pre-trained model because it already has good features recognition capabilities. In the process of training, our model has learned to recognise solar panels very efficiently to a greater level of accuracy. But accuracy is not the true metric for image segmentation tasks. Even though the loss is decreasing at a rapid pace, the mean intersection over union which is the metric to evaluate semantic segmentation task is still at 14 percent.

Improvement Possibilites

We are currently using only 200 images per class. The optimal number of images per class are still a research question.
Evaluation of the network based on F1 score could yield more insightful results about the network’s efficiency.
The down-stack part of the network is not trained, thus training it also could yield better results.
Class imbalance can be addressed by using a suitable loss function that can give more weight to minority classes during training.
Labelling of mild dust on solar panels could help the network recognise soiling more efficiently.

Scope of future work

The dataset contains static images, lacking diversity, hence datasets with varying solar panel environmental conditions will yield good results.
The current architecture might be too complex for the current network, hence simple few-layer convolutional network is a possibility.
The developed network is to be frozen to be deployed on a drone system for real-time inference. TensorRT and other optimisation programs yield a suitable network for real-time inference.

Team members

Amin Nouri
Nikhil Jayanth
Ranjit Roshan
Pranav Shah
Soham Bhute
Yingen Xu

Mentor

Jöran Rixen

References

[1] Center for Climate and Energy Solutions

[2] DLR — Soiling of solar power plants

[3] Affordable and Accessible Solar for All:Barriers, Solutions, and On-Site Adoption Potential

[4] MobileNetV2: Inverted residuals and linear bottlenecks (DOI: 10.1109/cvpr.2018.00474)

[5] U-net: Convolutional networks for biomedical image segmentation

TechLabs Aachen e.V. reserves the right not to be responsible for the topicality, correctness, completeness or quality of the information provided. All references are made to the best of the authors’ knowledge and belief. If, contrary to expectation, a violation of copyright law should occur, please contact journey.ac@techlabs.org so that the corresponding item can be removed.