Introduction

Topic

The aim of the project was to develop an application or a program, which, using a neural network, would allow for a moderately accurate detection of skin cancer in a given picture with a skin lesion.

Description of the problem

The idea behind such an application was that it would allow people to have an easy way to check, without time-consuming and often expensive visits to the dermatologist, if their skin lesion could potentially be dangerous. Skin changes can occur rather frequently. It would be at the very least hectic, while in many cases impossible if every single one of them were to be checked by the medical specialist. Malignant skin lesions are among the most dangerous cancer types. Not being noticed and treated early enough can and do lead to fatal outcomes.

While the proposed application could not be used to detect skin cancer with clinical accuracy, it could help its users feel more secure about their skin lesions. The application would allow them to pick a picture of the suspicious change on their skin. The application would use a neural network, trained on clinical pictures of skin lesions and inform the user whether its skin change is most likely harmless or whether there is a high risk of it being dangerous. The application could also, if possible, return the value of the predicted accuracy of the diagnosis and perform well on all skin colours.

Method

Base Idea

According to the proposed plan, the core of the application would be a neural network. In its base form, the application would only give an answer to whether the presented picture shows benign or malignant skin lesions. If time had allowed, the plan would have also included a feature that would inform the user about the specific type of skin cancer.

Preparatory Works

One of the first tasks of the team was to get the knowledge in the field of neural networks and convolutional neural networks necessary to conduct the project. None of the trainees working on the project had any prior experience in the area of neural networks and machine learning, therefore a huge amount of time was committed to studying those subjects.

Parallel to that, the data for the project needed to be gathered. It was concluded that most of the available skin cancer pictures data comes from the SIIM-ISIC Melanoma Classification Challenge, which in turns consists of pictures provided by The International Skin Imaging Collaboration. All datasets that were looked into were subsets of the pictures from the dataset provided for this challenge.

Diversity of the dataset

One of the optional features of the project was that the program would perform well on all types of skin colour. Currently, most of the pictures used in skin-related studies come from white-skinned patients therefore algorithms designed to detect skin cancer also tend to work properly on this type of skin much better than on the others. The search for more diversified datasets was unsuccessful. All of the reviewed datasets consisted of predominantly, if not only, white-skinned patients’ pictures therefore this part of the project was discontinued as there was no data to train or test the developed algorithm in that area.

Development phase

During the development phase of the neural network algorithm, we encountered an issue related to the size of the data. The original dataset for the SIIM-ISIC Melanoma Classification Challenge was too large to be efficiently utilized. This posed problems when attempting to transfer it to Google Drive for use in Google Colab, and making adjustments to the dataset on trainees’ personal devices before uploading was considered impractical. Instead, a smaller dataset was chosen. For a similar reason the usage of DICOM images discontinued. The original dataset not only consisted of a much larger number of files, but most importantly, thanks to the higher image quality individual files were much bigger in size compared to those in the chosen dataset. DICOM images being made in professional conditions generally tended to be of high resolution.

First, proprietary convolutional neural network architecture was tested. Secondly, an already established architecture, like AlexNet and VGG-16 were tested. During the course of the work, it was decided that the transfer learning approach would be applied. For that purpose, pretrained AlexNet and VGG-16 models were chosen and tested parallelly. The last layer of the neural network of each layer was replaced with the one with two outputs, as the original models were designed to classify images into 1000 categories.

In order to conform to the neural network models, the pictures were appropriately pre-processed. They were resized and cropped into pictures of 224x224 pixels from the middle of the original pictures. The vast majority of the original pictures had the skin lesion directly in their centre, so this approach was deemed better than random cropping. Pictures were also normalised, according to the respective models’ prerequisites. Pictures from one part of the dataset were each time divided, with a ratio of 7:3, into train and validation datasets, while the other part of the dataset was used as a testing dataset.

Project results

Outcome

Overall, both AlexNet and VGG-16 exhibited moderately high accuracy due to the use of pre-trained versions. During training, VGG-16 demonstrated slightly superior performance compared to AlexNet, boasting a 99% training accuracy as opposed to AlexNet’s 92%, and a 90% validation accuracy compared to 86%. Additionally, VGG-16’s test accuracy was marginally higher at 89% compared to AlexNet’s 86%. This improvement is attributed to VGG-16’s architectural enhancements over AlexNet, particularly the replacement of large kernel-sized filters with successive 3x3 kernel-sized filters.

Pretrained AlexNet

However, employing the VGG-16 architecture does come with some notable drawbacks. Training time is considerably longer, and the resulting model is memory-intensive, weighing in at over 500 MB. This is primarily due to the architecture’s depth and the large number of nodes in the fully connected layers. Consequently, in smaller classification tasks, more compact networks like GoogleNet or SqueezeNet are often preferred despite VGG-16’s impressive expressive capabilities.

Pretrained VGG-16

Next, a streamlit application was developed to predict skin moles for potential malignancy using the aforementioned models. The sidebar provides a general overview of the classification models utilised in this problem. To ensure accurate predictions, uploaded images of skin moles should be taken at very close range and be clear. This is imperative due to the limited training datasets available.

Issues faces during the project

Aside from the time (the project took place over the exam phase and the trainees are students so time management was a huge problem) and communication, the biggest issue was the number of people working on the project. The project goals were set with the assumption that 6 people would be working on the project but the size of the team gradually reduced during its course to 2–3 people.

The base level of knowledge regarding the subject matter was a significant issue. A huge part of the time assigned to the project had to be committed to researching from the ground up the nature of the neural networks (particularly convolutional neural networks) and how to apply them. As the main goal of the course was to learn, rather than simply finish the project part, it was advised that each team member would work on their own neural network. It was to be combined into one coherent program in the later stage of the project. While this solution did delay the work a bit, it certainly contributed to a better understanding of the matter and the skills required to apply it in practice.

As already mentioned, the sheer amount of available data was also an issue. Given that the project was mainly done using Google Collaboratory (although if that was not the case it would still be an issue) the size of the original dataset was problematic. The dataset could be reduced by decompressing the zip file only partially and writing a script that would assign extracted pictures to labelled folders, based on the indexing file. Such a solution was deemed unnecessarily complicated, given the already small amount of time left until the project’s deadline. For that reason, a much smaller dataset was chosen instead.

Conclusion

A way to improve the project would be definitely to expand it by missing optional features. While training the neural network to work properly on all kinds of skin types might be too hard to implement, given the current lack of necessary data, informing the user about the precision of the prediction should be implementable. The application could also give the user more information about the type of the detected skin lesion.

A follow-up version could also take different risk factors into account. DICOM files, in the database, contain supplementary information about the patient e.g. their age. It could be tested, whether the accuracy of the neural network differs in different age groups, genders etc. and adjust it accordingly.

Links

The streamlit application can be found here The source code can be found here