Automatic plant recognition using convolutional neural network on Malaysian medicinal herbs: the value of data augmentation

nutritional


Introduction
Plants are one of the most important elements on earth. Plants serve humans, animals, and the environment. Plants, for example, serve as a home for animals, a source of food for humans and animals, and a supply of oxygen for nature. There are many different species of plants in the world and in general, plant diagnosis is accomplished by skilled visual inspection and biological examination [1]. While herbs, particularly the most medicinal plant, have been used as traditional treatments by indigenous peoples from ancient times. Herbs are often recognized by practitioners based on years of intimate sensory or olfactory experience. In Malaysia, there are many species of local herbs and the most well-known are Pegaga, Selom, Cekur Manis, Selasih, Kesum, Kaduk, Salam, Bebuas, Ulam Raja and Beluntas. Nowadays, there are various automated technologies have been developed to recognize thousands of different plant species. The most prevalent attribute utilized to construct such automated plant recognition systems is leaf shape. Aside from shape, the leaf can reveal other details such as textures, veins, and colors [2]- [4]. Recent developments in analytical technologies have made botanical identification based on scientific data more feasible and many individuals benefit from this, especially those who are new to herbal recognition [5].
Object detection has been an important topic in the science of computer vision in recent years. The basic aim of object detection is to properly identify the region of interest in the picture and establish the unique category of each object [6] [7]. Artificial intelligence advancements have made it possible to diagnose plant leaves from raw pictures automatically. Deep learning is a learning approach based on neural networks. One of the benefits of deep learning is that it can automatically extract characteristics from photos [8] [9]. Deep learning enables computational models made up of numerous processing layers to learn data representations with varying degrees of abstraction. These technologies have significantly advanced the state-of-the-art in voice recognition, visual object identification, object detection, and a variety of other fields such as drug development and genomics [10]. Deep learning is gradually becoming one of the most significant technologies for image identification and this technique is now being used to classify and recognize plants [11] [12]. The convolutional neural network (CNN) model is now one of the most prominent designs and terms in machine learning and deep learning techniques [13] [14]. CNN has recently emerged as the most popular model among plant recognition researchers [15] [16].
In this paper, CNN model that has been created will be tested with two datasets. First one, Malaysian medicinal herbs real data and Malaysian medicinal herbs data that has been augmented. The accuracy of the proposed model with both datasets will be compared and discussed. The rest of the paper is organized as follows. In section 2, we describe the literature review of the overall research. In section 3, we explain the materials and methods that have been used, include datasets and deep learning model (CNN). Experimental analysis in section 4 and in this section we will go through the overall results of CNN model. Section 5 will be the last conclusion of the research.

Method
This section provides a comprehensive description on research methodology. The discussion begins with details of proposed convolutional neural network model and data pre-processing. All the augmentation process will be explained details in data pre-processing section.

Proposed Convolutional Neural Network Model
In this research, a CNN model was developed to detect the Malaysian medicinal herbs. Fig. 1 shows the structure of CNN model and Table 1 displayed the layers and details information for each layer. The first layer in this model is a convolutional layer, calculates 16 features for each 3x3 kernel. The second layer is a max-pooling layer with a 2x2 filter. In the next step, another convolutional layer calculates 32 features for each 3x3 kernel. Then, another max-pooling with a 2x2 filter. Next, another convolutional layer calculates 64 features for each 3x3 kernel. Afterwards, another max-pooling with a 2x2 filter. In the next step, another convolutional layer calculates 128 features for each 3x3 kernel. Then, another max-pooling with a 2x2 filter. Note that batch normalization was applied after each max-pooling layer and the rectified linear unit (ReLU) served as activation function. After these eight layers, a fully connected layer can be found and dropout 50 percent was applied. This will help the processing of the whole image. In the last phase, a softmax layer creates a vector with 10 entries from the preceding layer's result vector. These ten entries indicate the 10 type of medicinal herbs.

Data Pre-processing
Medicinal herbs images were collected from Taman Herba Bertam, Pulau Pinang, Malaysia. Located at Jalan Tun Hamdan Sheikh Tahir, Taman Bertam Indah, 13100 Kepala Batas, Pulau Pinang, Malaysia. Taman Herba Bertam is one of Malaysia's most popular herb gardens. It contains a diverse range of plant species, from shrubs to huge trees. In this research, there are 10 species of local herbs that has been chosen, which are pegaga, selom, cekur manis, selasih, kesum, kaduk, salam, bebuas, ulam raja and beluntas. The total number of images captured was 3000 images, with 300 images for each herb. Images of herbs were captured using mobile phone camera and all of these herbs were photographed in different angles to increase the image varieties. To guarantee that all of the photographs have a standard format, all of the images were acquired using standard settings: 10 Megapixels and a dimension of 224 pixels x 224 pixels. Fig. 2 shows sample of 10 local herbs species that has been chosen in this research. There are two datasets in this study. Dataset 1 consist of 3000 images for real data from 10 local herbs, where 300 images for each herbs. Dataset 2 consist total of 6000 images after combining with augmented data, where 600 images for each herbs. The augmentation techniques used were rotation 90°, 180°, 270°, flip vertical, and flip horizontal. The specifics of the augmentation techniques are shown in Table 2.  In this dataset, 2100 images were used for training, 600 images were used for validating and 300 images were used for testing. Dataset 2, herbs augmented data consists of 6000 images. In this dataset, 4200 images were used for training, 1200 images were used for validating and 600 images were used for testing.

Artificial Intelligent in Plant Recognition
Artificial intelligence (AI) is a popular term for branch of research that aims to provide machines with the ability to execute activities such as logic, reasoning, planning, learning, and perception [17]. Nowadays, artificial intelligence is widely applied in plant identification and recognition. Botanists may now quickly recognize numerous plant species due to AI advancements. Image processing-based machine vision technologies have been effectively applied in domains such as speedy and precise plant recognition and disease monitoring.

Convolutional Neural Network Algorithm
In the 1990s, LeCun et al. [18] used a gradient-based learning technique using CNNs to solve the handwritten digit classification issue with success. Following that, researchers enhanced CNNs even further and reported cutting-edge results in a variety of recognition tasks. CNNs offer various benefits, including being more similar to the human visual processing system, having a structure that is well tuned for processing 2D and 3D pictures, and being good in learning and extracting abstractions of 2D characteristics [19]. According to Alom et al. [19], convolution, max-pooling, and classification are the three layers that make up the CNN architecture. Convolutional layers and max-pooling layers are the two types of layers in the network's bottom and middle levels. Fig. 3 shows the architecture of CNN. Fig. 3. The Architecture of CNN [18] Convolutional Neural Network and is made up of three layers: input, middle, and output [20]. The input layer receives features as input; in other words, photos are supplied as input through this layer. The number of nodes in the middle layer is determined by the application. The output layer generates a result.
Convolutional layer along with the kernel matrix, it conducts a convolutional operation on the pixel values in it. The value is obtained by sliding the kernel matrix over the pixel matrix. Maxpooling layer, this is used to reduce the size of the filter map that is generated as an output. This helps to avoid overfitting problems. ReLU activation function, Rectified Linear Units simply replaces all negative values in the output matrix with zero (0) and keeps all positive values. Fully connected layer, every node from the previous hidden layer is connected to the next set of nodes in the next hidden layer in this layer. The FC layer may include the required number of nodes. Dense Nodes is another name for the FC layer. By using edge connection between the neurons in each layer, all previous and subsequent layers will be linked.

Herbs Dataset and Data Augmentation
Nowadays, there are more than 56 plant datasets available online. For example, there are PlantVillage dataset, Pl@ntNet dataset, Plant_leaves dataset, LifeCLEF dataset, and many more. There are millions of images in all of the datasets. The Plantvillage database (www.plantvillage.org) contains 3852 colour leaf photos for the algorithm test of plant disease identification using machine learning [21]. The LifeCLEF 2015 dataset, which comprises 113204 pictures of various plant parts (e.g. flowers, fruits, leaves, and stems) from 1,000 distinct tree, herb, and fern species, was used to assess the performance of deep neural networks [22].
Overfitting is a severe problem in deep learning, resulting in high performance during training but poor performance while testing. Through data augmentation, it may be successfully prevented. Existing methods established the efficacy of data augmentation using various ways. Random image cropping and patching (RICAP) [23] is a data augmentation technique that randomly crops four photos and patches them together to create a new image, as well as mixing the class labels of the four images to take use of the soft labels. Reference [24] segments the vibration data into samples before recombining them.

Related Work
As previously indicated, several studies have used deep learning technique to investigate plant identification. Deep learning is one of the most extensively utilized approaches for recognizing plants since it has outperformed other algorithms in terms of detection [25]. The dataset of Vietnamese plant images was gathered from an online encyclopedia of Vietnamese species and evaluated using a deep learning technique; the results demonstrate that the system not only works well but also compact in its application procedure [26]. Thai medical plants have been a value and worthiness since the Sukothai era, and are known as "Thai traditional home medicine." Researchers have used deep learning (CNN) to identify Thai herbs and their therapeutic characteristics, and have attained an accuracy of 80 percent [27].
Due to complex backdrop and diverse herb patterns, experimental results show that deep learning models enhance accuracy in China plant species identification significantly [28]. In the test set, the suggested model achieves 91.78 percent accuracy, suggesting that deep learning is a promising method for large-scale plant categorization in the natural world [29]. Deep learning models were employed to categorize photos of plants species in the PlantCLEF 2015 dataset, and the results demonstrate that the accuracy is improving [30]. According to Hu et al. [31], for ripe tomato recognition, a study was conducted that combined deep learning with edge contour, and this technique yielded numerous improvements. First, while evaluating possible ripe tomato region, deep learning takes less time and extracts more characteristics than standard approaches. Plant leaf diseases identification was explored, and it has shown that by adding the CNN to the support vector machine (SVM) classifier, an average classification accuracy of 96.63 percent can be achieved for the classification of leaf diseases [32].
According to Fuentes et al. [33], a study was conducted to see if a deep-learning-based technique could be used to identify illnesses and pests in tomato plants using pictures taken in-place by camera devices of varying resolutions. As a result, three types of detectors exist: Faster Region-based Convolutional Neural Network (Faster R-CNN), Region-based Fully Convolutional Network (R-FCN), and Single Shot Multibox Detector (SSD). According to Zhu et al. [34], a study on plant identification using extremely deep convolutional neural networks was conducted on multi-organ datasets of 135 photos from eight species, encompassing the organs of leaves, flowers, branches, stems, and fruits. The findings show that it operates admirably well, with a 100% accuracy rate. According to Jeon et al. [35], a new method was proposed to classify leaves using the CNN model, and two models were created by adjusting the network depth using GoogleNet, and the performance of each model was evaluated according to the discoloration of, or damage to, leaves, with a recognition rate of greater than 94 percent. Table 3 and Table 4 show details of loss, accuracy, val_loss and val_accuracy for both herbs real data and augmented data. Fig. 4 and Fig. 5 show the overall plot for accuracy and losses against epochs for herbs real data and herbs augmented data. Loss represents the training loss achieved in the research, whereas accuracy represents the training accuracy acquired in the research. While val loss represents validation loss, val accuracy represents validation accuracy attained in the research. The maximum epoch used in the research is 50 epochs. From Table 2, herbs real data that consist 3000 images, used 2100 images for training and 600 images for validating. Herbs real data achieved the highest training accuracy at 71% and highest validation accuracy at 70%. While, herbs augmented data that consist 6000 images, used 4200 images for training and 1200 for validating. Herbs augmented data achieved the highest training accuracy at 84% and validation accuracy at 84%. Overall accuracy result shows that herbs augmented data more advance in term of accuracy and achieved higher accuracy than herbs real data.   Confusion matrix for herbs real data is tabulated in Fig. 6 and confusion matrix for herbs augmented data is tabulated in Fig. 7. The number labels in the confusion matrix represent the type of herbs, Bebuas: 0, Beluntas: 1, Cekur Manis: 2, Kaduk: 3, Kesum: 4, Pegaga: 5, Salam: 6, Selasih: 7, Selom: 8, and Ulam Raja: 9. The total image of confusion matrix for herbs real data is 300 images as 10% for data testing from 3000 images and confusion matrix for herbs augmented data is 600 images as 10% for data testing from total 6000 images. Table 5 and Table 6 show the overall precision, recall and f1-score for both datasets.    Table 5 shows result of precision, recall and f1-score for herbs real data and Table 6 shows result of precision, recall and f1-score for herbs augmented data. Herbs real data consist of 300 images for data testing achieved 79% of average precision, 75% of average recall and 75% of average accuracy. Herbs augmented data consist of 600 images for data testing achieved 88% of average precision, 88% of average recall and 88% of average accuracy. The overall average f1-score for herbs real data are 74% and for herbs augmented data are 87%. From Table 5, Cekur Manis and Pegaga achieved f1-score of 45% and 49%. These two results have effect to the low average accuracy and f1-score for herbs real data. While, Table  6 shows that Cekur Manis and Pegaga earned f1-scores of 83% and 69%, respectively, which is greater than the f1-score of herbs real data. Cekur Manis and Pegaga from Table 5 did not receive enough data in the training model, resulting in poor accuracy and f1-score. Meanwhile, in Table 6, the augmentation method resulted in greater accuracy and f1-score due to the large number of data points and a good training process for the model. From these result we can see that herbs augmented data achieved better performance than herbs real data in term of precision, recall, accuracy and f1-score. Due to the large quantity of data for herbs augmented data, the testing method has improved and models have been trained to produce better outcomes.

Conclusion
In this research, convolutional neural network model was used to examine the accuracy of data before and after it was augmented using Malaysian medicinal plants datasets. First dataset, herbs real data consist of 3000 images and second dataset, herbs augmented data consist of 6000 images. Both datasets were tested using CNN model, herbs real data achieved average accuracy at 75% while herbs augmented data achieved average accuracy at 88%. As a result of this finding, CNN has become the most well-known deep learning system, capable of high image detection accuracy. Using enhanced data in research is also critical since it may improve research accuracy.