Android skin cancer detection and classification based on MobileNet v2 model

The latest developments in the smartphone-based skin cancer diagnosis application allow simple ways for portable melanoma risk assessment and diagnosis for early skin cancer detection. Due to the trade-off problem (time complexity and error rate) on using a smartphone to run a machine learning algorithm for image analysis, most of the skin cancer diagnosis apps execute the image analysis on the server. In this study, we investigate the performance of skin cancer images detection and classification on Android devices using the MobileNet v2 deep learning model. We compare the performance of several aspects; object detection and classification method, computer and Android based image analysis, image acquisition method, and setting parameter. Skin cancer actinic Keratosis and Melanoma are used to test the performance of the proposed method. Accuracy, sensitivity, specificity, and running time of the testing methods are used for the measurement. Based on the experiment results, the best parameter for the MobileNet v2 model on Android using images from the smartphone camera produces 95% accuracy for object detection and 70% accuracy for classification. The performance of the Android app for object detection and classification model was feasible for the skin cancer analysis. Android-based image analysis remains within the threshold of computing time that denotes convenience for the user and has the same performance accuracy with the computer for the high-quality images. These findings motivated the development of disease detection processing on Android using a smartphone camera, which aims to achieve real-time detection and classification with high accuracy.


Introduction
Cancer is one of the many leading causes of human death in the world [1]. A type of cancer that is well known is skin cancer. There were 68,130 cases of skin cancer in America, which caused 8,700 deaths in 2010 [2]. Two examples of skin cancer types are Melanoma and Actinic Keratosis. World Health Organization (WHO) reported 2-3 million cases of non-melanoma skin cancers, and 130 thousand cases of melanoma sufferers each year [3]. In order to reduce the number of late diagnoses, cancer detection using technology assistance is needed.
A form of technology that has a rapid development today is a smartphone or mobile phone. The development of technology in smartphones in the current era introduces new possibilities for disease detection only through smartphones. One part of a smartphone component that has the most significant role in cancer detection is a camera [4]. Images captured by smartphone cameras can be processed by smartphones to detect objects in the picture. In addition, a smartphone as a detector has wireless connectivity, the ability to perform high-resolution photography, and excellent portability. Nowadays, The latest developments in the smartphone-based skin cancer diagnosis application allow simple ways for portable melanoma risk assessment and diagnosis for early skin cancer detection. Due to the trade-off problem (time complexity and error rate) on using a smartphone to run a machine learning algorithm for image analysis, most of the skin cancer diagnosis apps execute the image analysis on the server. In this study, we investigate the performance of skin cancer images detection and classification on Android devices using the MobileNet v2 deep learning model. We compare the performance of several aspects; object detection and classification method, computer and Android based image analysis, image acquisition method, and setting parameter. Skin cancer actinic Keratosis and Melanoma are used to test the performance of the proposed method. Accuracy, sensitivity, specificity, and running time of the testing methods are used for the measurement. Based on the experiment results, the best parameter for the MobileNet v2 model on Android using images from the smartphone camera produces 95% accuracy for object detection and 70% accuracy for classification. The performance of the Android app for object detection and classification model was feasible for the skin cancer analysis. Android-based image analysis remains within the threshold of computing time that denotes convenience for the user and has the same performance accuracy with the computer for the high-quality images. These findings motivated the development of disease detection processing on Android using a smartphone camera, which aims to achieve real-time detection and classification with high accuracy. the detection of objects in imagery can be done using one technique from the branch of machine learning, namely deep learning [5]. Deep learning is a novel machine learning method that is growing rapidly now. Deep learning has a higher level of sensitivity compared to other machine learning methods [6]. Deep learning has been used to analyze biomedical data, such as medical images, biological sequences, and protein structures [5]. Research related to skin cancer classification was carried out by developing a deep neural network with a Google base of inception v3 using a dataset of 2000 images with a division of 374 melanoma, 1372 nevus, and 254 seborrheic keratosis [7]. In 2018, Haenssle et al. [8] used Google Inception v4 CNN architecture for the classification of Melanoma.
Skin cancer has an average size of 6 mm [9], which is exceedingly small as compared to the total skin area captured by the camera. Research on skin cancer segmentation was carried out in 2017 using Fully Convolutional Network to decipher the objects of skin cancer [10]. One of the disadvantages of segmentation is only marked the object without object recognition. Therefore, it requires object detection and classification to determine the type of skin cancer.
The latest developments in the smartphone-based skin cancer diagnosis application allow simple ways for portable melanoma risk assessment and diagnosis for early skin cancer detection [11]. Several mobile apps are ready on the app store, such as SkinVison [12], SpotMole [13], Deep Learning for Melanoma [14], and DermIA [15]. These apps provide a simple way of performing a Melanoma risk assessment on outer skin by taking a photo of the skin spot with a smartphone camera. Risk assessment was generated by performing a similarity check between the photo taken and photos of skin cancer. However, several applications (SpotMole, Deep Learning for Melanoma, DermIA) do not seem to have a comprehensive study. Therefore, the accuracy of the applications is unknown.
Moreover, the application also often fails to recognize Melanoma, which is a very high-risk skin problem. Different from other apps, the SkinVision app is more trusted for early detecting skin cancer. The SkinVison app uses a comprehensive study to perform detection. Initially, Skin-Vision is using a rule-based fractal algorithm [16], then followed by analyzing pigmented and non-pigmented lesions [17]. In the image's analysis process, SkinVision requires an internet connection for sending images to the server. The client-server architecture was also offered in the eSkin [18] application to not to burden the computation of image analysis on smartphones and the algorithm updates. The main reason for running the image analysis on the server is the time complexity and error rate when performing the analysis on the smartphone device. However, Melanoma is more common in rural areas where the internet connection is limited. Hence performing image analysis on the smartphone device is required in this situation [19].
MobileNet implements a simple architecture using depth-wise separable convolutions to lightweight deep neural networks [20] for mobile vision applications. MobileNet v2 is the addition of bottleneck layers and shortcut connections [21]. MobileNet is commonly used for object detection. Nevertheless, it is also possible to use MobileNet for classification. In this study, we investigate how the performance of MobileNet v2 running on the smartphone for image analysis of skin cancer detection and classification. Object detection is used to allow the image analysis process with full skin background, while the classification is used to classify the objects that have been cropped according to the training size. The learning rate and epoch parameters were selected to handle the overfitting problem [22]. The code is publicly available through https://github.com/bowoadi/Melanomax/. We compare the performance of several aspects; object detection and classification method, computer (Jupyter notebook) and Androidbased images analysis, and images acquisition method. Skin cancer Actinic Keratosis and Melanoma are examined to test the proposed method performance. Accuracy, sensitivity, specificity, and running time of the testing methods are used for the measurement. exposure of ultraviolet radiation. The characteristics of the skin affected by this cancer are crusty, scaly skin with a brownish color, pink, or a combination of these colors [23]. Fig. 2(a) shows Actinic Keratosis originating from the dataset used. Melanoma is a type of malignant cancer that attacks humans in ages ranging from 25 to 50 years old. The cause of Melanoma is known to come from two things, i.e., exposure to UV light and genetic factors [24]. The case of Melanoma is typical in remote areas. The characteristics of Melanoma skin cancer are an irregular shape, consisting of more than one color, itching, and bleeding and can attack anybody parts [19]. Fig. 2(b) shows an example of Melanoma skin cancer. The dataset used in this study was 640 images downloaded from the website https://isicarchive.com, five images were downloaded from https://cancer.org, and five images from Google Image without regard to age or other factors.   The second step was splitting the dataset. The 640 images downloaded from the website https://isicarchive.com divided into training data, validation data, and test data. Training data contain 200 melanoma and 200 Actinic Keratosis images, while validation data contain 100 Melanoma and 100 Actinic Keratosis images, and Test data contain 20 melanoma and 20 Actinic keratosis images. Training, validation, and test data are different images. Melanoma and Actinic keratosis images with the size information are printed to the paper for Android testing purposes. The printed images were five images downloaded from https://cancer.org and five from Google Image.
The third step was ROI determination for manual interpretation to train MobileNet v2. The process produced an XML file containing the coordinates of the object suspected of being cancer. The determination of ROI was done using the LabelImg application that could be downloaded at https://github.com/tzutalin/labelImg. Original images with object coordinates were used for object detection training and validation. Then the cropped images based on ROI coordinates were used for the classification training and validation.
The fourth step was the training, where the MobileNet v2 is trained for object detection and classification. The details of both models discussed in the next subsection. The next step was parameter optimization by selecting the learning rate, epoch parameters, activation function, and batch normalization to decrease the overfitting problem. After optimizing the parameters, the achieved model is tested on both computers by using Jupyter notebook and on Android apps. The next step was the MobileNet v2 evaluation. Accuracy, sensitivity, specificity, and running time of the testing data are measured for evaluating the object detection and classification model. The last stage was choosing the most optimized model for Android skin cancer detection and classification.

MobileNet
MobileNet is a deep learning model that is developed for efficiency and can be implemented on embedded devices or mobile devices such as smartphones without compromising with resources [25].   convolution. The 1x1 convolution increases the number of channels to enrich the features. Depthwise separable convolution has two types of convolution layers; depthwise convolution and pointwise convolution, which aim to reduce computing costs. The reduction is executed by separating the feature filtering process at 3x3 depthwise convolution and then the combining feature process at pointwise convolution [20]. Operations on 3x3 depthwise convolution perform by separating all channels on the input, and each channel is convoluted with filters at the 3x3 depthwise convolution layer in the order. Fig. 5(a) shows the example that the red channel in the input is convoluted with the first channel of the depthwise convolution layer first filter. Then, the 1x1 pointwise convolution process is the convolution of all channels on the input that has passed through the 3x3 depthwise convolution layer with all filters at the pointwise convolution layer one by one and in sequence. Fig. 5(b) shows the operation at 1x1 pointwise convolution. All layers are followed by batch normalization and activation function. Batch normalization can reduce the gradient dependence on parameter scales. Batch normalization is a normalization process by reducing the average value and dividing it by the standard deviation [26]. In the activation layer, ReLU is default activation on MobileNet v2. ReLU is an activation function that was first introduced by H Sebastian Seung in 2000. The activation function serves to activate and deactivate neurons [27]. Specifically, ReLU6 is used on every layer except in the last convolution layer. The equation for the activation function ReLU6 is shown in (1).
where f(x) is a ReLU6 activation result, and x is the value applied to be changed in the range of (0, 6).
ReLU6 has a range between 0 to 6. ReLU6 is used in the MobileNet v2 model because it is stronger than the ReLU activation function [28]. Moreover, ReLU6 has an advantage that is able to retain information from images in low-precision computation [21]. Activation layer unused on the last convolution layer to avoid the elimination of important features. Then the last feature extraction on MobileNet v2 is conv2D 1x1.
The difference between the MobileNet v2 model for detection and classification is in the last layer. The last layer MobileNet v2 for detection contains SSDLite. SSDLite is a modification from regular SSD, which replaced all regular convolution with depthwise separable convolution. The purpose of SSDLite is to make MobileNet v2 more efficient. SSDLite can reduce parameter and cost [21]. The last layer MobileNet v2 for classification contains avgpool, conv2d 1x1, and softmax for image classifier. We used a 300x300x3 image input size, or we used the RGB image for detection and 224x224x3 image input size for classification.

The experiment results
The experiments were conducted out using the TensorFlow library and MobileNet v2. There were five scenarios in this study. The first scenario aimed to achieve the best learning rate value using 30,000 epochs; the second scenario aimed to obtain the best learning rate value using 15,000 epochs. The results of these two scenarios were then compared to get the best learning rate value and the most optimum number of the epoch. The third scenario was comparison MobileNet v2 with different activation functions and batch normalization. The fourth scenario was produced to validate the MobileNet v2 model in detecting and classifying skin cancer objects located in the printed real size of skin cancer. In the last scenario, MobileNet v2 was tested with static image input. Images of skin cancer are loaded from the camera gallery into the app.

Device settings in testing scenario
Testing is done by using 40 test data, which are divided into 20 images for Melanoma and 20 images for Actinic Keratosis. Testing was performed on an Android smartphone (Samsung J530G Pro) and Jupyter notebook application (running on NVIDIA GTX 1070Ti) for each model. The distance between the smartphone camera and image to be captured is 10 cm. In Scenario 1 and scenario 2, Android smartphone captures live photos on the screen or the LCD monitor. This scenario is designed to simulate the condition of analyzing skin cancer with a digital dermatoscope, where the dermoscopic image of the lesion is displayed on the screen or the LCD monitor. Scenario 3 was aimed to have performance comparison MobileNet v2 with different activation functions and batch normalization. Jupyter notebook application is used for the testing. In scenario 4, Android smartphone captures live photos on printed images of actual size skin cancer. In scenario 5, Images of skin cancer are loaded from the camera gallery into the Android app. Sketch of parameter testing using the Android smartphone is shown in Fig. 6.

Scenario 1
In this experiment, the MobileNet v2 model was used to detect and classify between Melanoma and Actinic keratosis skin cancer objects by using 30,000 epochs and four different learning rate values. Each experiment was done in an Android smartphone and a computer using Jupyter Notebook. The results of this scenario are presented in Table 1 and Table 2, where J means experiments in the Jupyter Notebook and S for experiments in the smartphone. Table 1 shows the results of Scenario 1 in skin cancer detection, and Table 2 shows the result of Scenario 1 in skin cancer classification. Fig. 7(a) and (b) are shown the display of the Jupyter Notebook and Android app for skin cancer detection. Fig. 7(c) and (d) are shown the display of the Jupyter Notebook and Android app for skin cancer classification. Based on Table 1, the best performance for skin cancer detection for both Jupyter Notebook and the Android app, were produced when using a 0.0001 learning rate. The best accuracy was 97.5% in Jupyter Notebook and 90% in the Android app. The best performance for skin cancer classification by using Jupyter Notebook and the Android app based MobileNet v2 was achieved using a learning rate of 0.005 ( Table 2). The average of computing times was less than 2 seconds for object detection and less than 1 second for the classification.

Scenario 2
In this experiment, the MobileNet v2 model was used to detect and classify Melanoma and Actinic keratosis skin cancer objects by using 15,000 epochs and four different learning values. Table 3 shows the results of Scenario 2 in skin cancer detection, and Table 4 shows the result of Scenario 2 in skin cancer classification. Based on Table 3, the best performance of skin cancer detection in Jupyter Notebook was 100%, and the Android app was 95%. As we mainly focus on the Android app, the best learning rate for the Android app is 0.0001. Table 4 shows the best performance of skin cancer classification using Jupyter Notebook was 90%, and using the Android app was 70%. In this case, we chose a 0.0005 learning rate due to high sensitivity. It could be seen that, in this scenario, the running time was the same as the previous scenario, less than 2 seconds for object detection, and less than 1 second for classification.

Scenario 3
Scenario 3 aimed to have performance comparison MobileNet v2 with different activation functions and batch normalization. The parameters used were based on the results of the previous scenarios, 0.0001 learning rate, and 15,000 epochs for skin cancer detection and 0.0005 learning rate with 15,000 epochs for skin cancer classification. Scenario 3 used a Jupyter notebook only for the testing. The results of scenario 3 can be seen in Table 5 and Table 6. Based on the result shows that ReLU6 with batch normalization outperformed other setting experiments.

Scenario 4
Scenario 4 is aimed to test the MobileNet v2 model in detecting and classifying actual skin cancer images. The parameters used were based on the results of the previous scenarios, 0.0001 learning rate, and 15,000 epochs for skin cancer detection and 0.0005 learning rate with 15,000 epochs for skin cancer classification. Scenario 4 used a smartphone camera that had a zoom feature. The images of skin cancer were printed according to cancer size. Several zoom settings used in this scenario; 1x zoom, 2x zoom, 3x zoom, and 4x zoom. The results of scenario 4 can be seen in Table 7 and Table 8.  Ten images of skin cancer printed on actual size were used to test the MobileNet v2 model. For the object detection, the MobileNet v2 model could not detect all skin cancer objects correctly at 1x zoom, 2x zoom, and 3x zoom. The accuracy of object detection on the 4x zoomed obtained 60% with 80% 143 Vol. 6, No. 2, July 2020, pp. 135-148 sensitivity and 4% specificity. Fig. 8 shows the Android application interface when detecting actual skin cancer images. On the other hand, for skin cancer classification, the MobileNet v2 model produces 50% accuracy at 1x zoom, 2x zoom, and 3x zoom with 100% sensitivity but with a 0% specificity value. MobileNet v2 gets the highest accuracy of 60% for direct skin classification on printed paper when zooming 4x. Fig. 9 shows the Android application interface for the classification of skin cancer printed in actual size. The running time of detection and classification was the same as the previous scenario, less than 2 seconds for object detection and less than 1 second for classification.

Scenario 5
In scenario 5, the classification of skin cancer with MobileNet v2 is tested with static image input. Images of skin cancer are loaded from the camera gallery into the app. The images loaded are of two types, 40 test images, as shown in Fig. 10(a) and 10 photographs of 4x zoom actual skin cancer images printed on paper, as shown in Fig. 10(b). The skin cancer classification model was chosen in this scenario because the accuracy and sensitivity produced were higher than object detection. The best setting with the learning rate is 0.0005, and epoch value 15000 is used, which has resulted in 60% accuracy with a sensitivity of 100%.

144
International Journal of Advances in Intelligent Informatics ISSN 2442-6571 Vol. 6, No. 2, July 2020, pp. 135-148 Based on Table 9, the accuracy value produced by the Jupyter notebook is the same as the value of accuracy produced by the Android app for 40 test images. The running time of computation for image analysis in this scenario higher compare to the previous app. The previous scenario used a live camera to detect and classify skin cancer, but in this scenario, images are loaded to the system the same as the Jupyter notebook system. Using Samsung J530G pro, it was required average in 20 seconds to analyze the skin cancer, and when using Samsung Note 9, it was required only average in 2 seconds to analyze the skin cancer.

Discussion
The use of higher epochs generally resulted in higher accuracy, but based on scenario 1 and scenario 2, 30,000 epoch parameters produced lower accuracy values than 15,000 epochs for detection. The loss graph from the training process with 30,000 epochs with learning rate 0.0001 can be seen in Fig. 11(a), and loss graph from the training process with 15,000 epochs with learning rate 0.0001 for detection in Fig. 11(b). Based on Fig. 11(a), when the epoch is more than 15,000, overfitting occurred, and the loss could not decrease any less than 2%. Fig. 11(b) showed a stable loss decrease under 4% and did not indicate an overfitting problem. The accuracy value of the testing process using Jupyter Notebook outperformed the accuracy value because when using a smartphone camera, there were some influencing factors. These influencing factors included the camera's ability to capture images, the amount of light, and noise level [29]. Nevertheless, the best accuracy from scenario 1 and 2 using the Android app was 95% for object detection and 70% for classification. The results show the feasibility of supporting the analysis from the monitor on a digital dermascope.
In scenario 3, the batch normalization and ReLU6 outperformed the accuracy value of the experiment. The batch normalization increases the stability of the method by normalizing the input layer [26]. Moreover, ReLU6 requires less computation process compare to other activation then make faster training and convergence [30]. In scenario 4, the Android smartphone could not recognize skin cancer objects at zoom 0x, 2x zoom, and 3x zoom in object detection. Undetected skin cancer due to the size of the objects was inadequate for target detection. Moreover, the low camera's capability factor on Android devices makes insufficient light and adding noise into images that affected the detection process of skin cancer. Android smartphones could detect skin cancer at 4x zoom and produced 60% accuracy, 80% sensitivity, and 40% specificity for object detection and 60% accuracy, 100% sensitivity, and 20% specificity for classification.
In scenario 4, the accuracy rate produced by the Jupyter notebook (the computer) is equal with Android-based images analysis. The performance value of 40 validation images was 90% accuracy, 95% sensitivity, and 85% specificity. MobileNet v2 models continuously produced high sensitivity rates, which means it can classify Melanoma better than Actinic Keratosis. To find out how the model discriminates the two classes, we visualize the convolution results of the multiple layer process on the model using the Matplotlib library, as illustrated in Fig. 12. In Fig. 12, we display only a few images features on a particular layer, to reveal how the feature extraction process. First, after passing the convolution layer, the convolution images input will be processed in the first bottleneck layer. The first bottleneck layer results show that that the featured image of the Melanoma is more visible as compared to Actinic Keratosis. Unclear convolution images on Actinic Keratosis were happened due to its color characteristic. The Melanoma image is completely contrasting on skin color, while the Actinic color image is more predominantly red and less clear between background and cancer. In the middle bottleneck layer, the Melanoma features were also more visible as compared to the Actinic. However, in the last bottleneck layer produces significant features to contrast between the Melanoma and Actinic Keratosis images. These features are reliable to improve classification and object detection performance. The conv2D 1x1 layer becomes the last layer before entering SSDlite for object detection or Avgpool-Conv2d-Softmax for classification. One thousand one features are obtained in this layer and can be more significant if the input image is not resized 224x224 (classification). This process shows that the bottleneck layer produces a small number of features for contrasting the two-class images. Moreover, a small number of features is essential for low computing so that it can run on smartphone devices.  Moderate accuracy was obtained while using photos obtained from an Android camera on printed images, as well as monitor screens. Most problems occur in Actinic images. As an illustration in Fig. 13, image (a) is an Actinic image of validation images, image (b) was obtained from an Android camera on the first shot, and image (c) was obtained from an Android camera on the second shot. The result of the classification, images (a) and (b) were classified as Actinic, while image (c) was classified as Melanoma.  Images from Android camera 1 on printed paper (c) Images from Android camera 2 on printed paper.

Conclusion
This study was conducted to determine the performance of skin cancer image detection and classification on Android devices using MobileNet v2 deep learning model. Based on the experiment results, the best parameter for learning rate and epoch were obtained with learning rate 0.0001 and 15,000 epochs for object detection and 0.0005 learning rate with 15,000 epochs for the classification. The Bottleneck layer was improving the feature extraction for input on object detection using SSDlite and Classification using SoftMax. The 95% accuracy for object detection and 70% for classification for smartphone camera as input show the feasibility for supporting the analysis from the monitor on a digital dermascope. The equal high accuracy between computer and Android images analysis was obtained when using a high-quality image of skin cancer that was loaded in the system with 90% accuracy, 95 sensitivity, and 85% specificity. Android-based image analysis is still within the threshold of computing time that is convenient for the user and has the same performance accuracy with the computer for the same quality images, with less than 2 seconds for live camera mode and 20 seconds for loaded image mode. Testing detection scenario and classification using a smartphone on images printed with the actual size of skin cancer resulted in the best accuracy at 4x zoom. However, image acquisition requires improvement in normalization input and getting high-quality images and visibility. Several further potential developments are possible to be performed to enhance the results. In the classification model, it is possible to add a dropout layer to reduce the event of overfitting [31]. Adam's optimization [32] is also possible to set the learning rate more adaptive so that the model can identify more high-grade features. Integrating skin lesions segmentation as the picture preprocessing step also is potentially improved the accuracy of skin cancer classification. Moreover, standardization of images acquisition can be used to capture high-quality images and excellent visibility. The use of compact microscopes for smartphones that can function as dermascope can be implemented by connecting the compact microscope with the Android camera.