Fish species recognition using transfer learning techniques

Article history Received October 10, 2020 Revised December 5, 2020 Accepted January 6, 2021 Available online July 31, 2021 Marine species recognition is the process of identifying various species that help in population estimation and identifying the endangered types for taking further remedies and actions. The superior performance of deep learning for classification is due to the property of estimating millions of parameters that have to be extracted from many annotated datasets. However, many types of fish species are becoming extinct, which may reduce the number of samples. The unavailability of a large dataset is a significant hurdle for applying a deep neural network that can be overcome using transfer learning techniques. To overcome this problem, we propose a transfer learning technique using a pre-trained model that uses underwater fish images as input and applies a transfer learning technique to detect the fish species using a pre-trained Google Inception-v3 model. We have evaluated our proposed method on the Fish4knowledge(F4K) dataset and obtained an accuracy of 95.37%. The research would be helpful to identify fish existence and quantity for marine biologists to understand the underwater environment to encourage its preservation and study the behavior and interactions of marine animals.


Introduction
Underwater object recognition is one of the most active research fields. Fish population estimation and classification of fish species are essential tasks for ocean observation, assessment of fish stocks and ecosystems, abundance, and diversity in sea /ocean [1]. These tasks benefit scientific and commercial applications like fish farming [2] and understanding food availability and predator-prey relationships [3]. Fish classification is a challenging task due to low-quality underwater videos that are recorded from unconstrained environments (luminosity and illumination changes) in the sea, fish movements, nonlateral fish views, partially visible fish, and the presence of sediments and organic debris [4], [5]. The appearance of fishes in different scales, orientations, curved body shapes, and strong visual similarity between species in shape, size, and coloring further complicates the classification process. Recognition of species from underwater video for assessing fish abundance has limited attention among the researcher's community.
Marine species recognition is the process of identifying various species that help in population estimation and identifying the endangered types for taking further remedies and actions. The superior performance of deep learning for classification is due to the property of estimating millions of parameters that have to be extracted from many annotated datasets. However, many types of fish species are becoming extinct, which may reduce the number of samples. The unavailability of a large dataset is a significant hurdle for applying a deep neural network that can be overcome using transfer learning techniques. To overcome this problem, we propose a transfer learning technique using a pre-trained model that uses underwater fish images as input and applies a transfer learning technique to detect the fish species using a pre-trained Google Inception-v3 model. We have evaluated our proposed method on the Fish4knowledge(F4K) dataset and obtained an accuracy of 95.37%. The research would be helpful to identify fish existence and quantity for marine biologists to understand the underwater environment to encourage its preservation and study the behavior and interactions of marine animals.

189
International Journal of Advances in Intelligent Informatics ISSN 2442-6571 Vol. 7, No. 2, July 2021, pp. 188-197 patterns. An efficient method for fish detection, counting, and species classification has been done using blob counting and shape analysis [1]. The automatic classification system that extracts the images' geometry, morphology, and texture using a neural network was devised in [12] to classify different species like fish, butterflies, and plants. A sparse representation-based classification (SRC) was proposed by Shiau et al. [13] for fish recognition and verification using a maximum probability of partial ranking method. A robust computational method based on an image-set matching approach was proposed using Graph-Embedding Discriminant Analysis [14].
A hierarchical classification level was proposed in [15] that uses the Balance Guaranteed Optimized Tree (BGOT) algorithm for fish classification. Rich feature descriptors and separate features from different fish parts were extracted to improve the discriminative power. A new method of image set classification was presented by Shafait et al. [16]. An automatic fish recognition and classification system were proposed by Rodrigues et al. [17] using SIFT and PCA for parameterizing the features. Two immunological algorithms, namely Artificial Immune Network and Adaptive Radius Immune Algorithm used these features to cluster fish types. The underwater fish recognition framework was proposed using an unsupervised feature learning technique and an error-resilient classifier [18]. A deep architecture was proposed by Qin et al. [19] for fish classification. A special cross-layer pooling approach that combines the features from different layers of the deep network of CNN model was proposed by Shoaib et al. [20]. Sparse representation-based classification combined with maximum probability was proposed in [21] based on the visual features of Eigenfaces and Fisherfaces. Alsmadi et al. [22] identified the pattern of fishes using a multi-layer feed-forward neural network model with a backpropagation classifier based on color signatures. Vilon et al. [5] proposed two supervised methods for fish classification. In the first method, HOG features were extracted, SVM was used for classification, and Google Net architecture was used in the second method. In the 2015 SeaClef contest [15], a task on fish classification was conducted and the best results are achieved through deep learning-based techniques. Siddiqi et al. [20] used cross-layer pooling algorithm using pre-trained convolutional neural networks. Salman et al. [2] demonstrated the use of CNN for fish species identification.
Fish classification systems based on image processing techniques have a low accuracy rate since the systems do not handle environmental changes, different characteristics, and feature variability. The methods that use machine learning techniques have to undergo segmentation and feature engineering. Due to the unconstrained environment, segmentation may degrade the system performance [23]. Thus, deep learning methods are considered a good choice among all other techniques since they learn automatically from the input images and the system's accuracy depends on the learning parameters. The deep learning methods focus on learning higher-level features automatically by combining several lowlevel features. These high-level representation gives more information and helps to achieve more performance in classifying the fish types. The automatic feature learning capability resolves the dependency of human-crafted features but requires many training samples. Yet, this approach requires extensive training data.
In order to overcome the problem of data insufficiency, several research works have been reported using transfer learning techniques [24] [25]. The transfer learning technique is based on the hypothesis where training images are independent and identically distributed (i.i.d) with the target images. We have proposed a transfer learning technique for fish classification using a pre-trained Google inception v3 model trained using the general images. This model dramatically reduces the requirement of an extensive training dataset and helps in the reduction of the overfitting of a small dataset. Thus, in this paper, we propose and analyze the performance of fish classification using a transfer learning technique that contains three variants. To study the performance of the proposed system, we have used the images obtained from underwater videos captured in an unconstrained environment.

Method
We propose a transfer learning technique for classifying the fish species using the images obtained from underwater videos. A pre-trained Google Inception v3 [4] model has been used to extract the weights, and these extracted weights are used to train the input images and classify the test images. In this paper, we have proposed three different methods to classify the fish species. In the first method, we used transfer learning as a feature extractor where representation vectors of the pre-trained model were used, and the softmax classifier was modified according to the target image classes. In the second method, we have used FineTuning, which is also a transfer learning technique where the first two layers were kept freezing, and the other layers have been trained with target images and in the last method, the weights obtained using the second method were used to train the SVM classifier in order to classify the target images. The performances of all three methods were analyzed.
Transfer learning is an active research area in which the knowledge gained in one or more tasks is applied to different tasks [26]. This can be achieved using a pre-trained model which has been previously trained on a large dataset. The pre-trained model can be customized for a given task which is known as feature extraction and Fine-Tunning. The trained model can be used directly by modifying the output layer, and the rest of the layers can be used to extract features, called as feature extraction model, or the pre-trained model can be trained partially by freezing and unfreezing the required layers based on the application area, known as fine-tuning.
Several pre-trained models such as Google Inception, AlexNet, and VGGNet can be used for image classification based on transfer learning techniques. The Google Inception model contains fully connected layers with global average pooling that averages the channel values across 2D feature maps. Due to this, it achieves more accuracy on the ImageNet dataset with reduced parameters than AlexNet and much faster than VGG. So, a transfer learning-based fish classification system using Google inception model has been proposed in this paper.

Google Inception V3 as Feature Extractor
The Google Inception v3 model [27] contains 42 layers with inception modules, fully connected layers, and batch normalization layers. Each inception layer consists of convolution layers with different filters followed by the average pooling or max-pooling layers. The convolutional layers consist of learnable filters that learn a hierarchical representation of the input in which the initial layers detect the basic patterns like edges and gradients, while top layers learn patterns specific to the input images. The knowledge gained by the lower layers from the source domain may serve as valuable knowledge for the new domain.
In this method, the Google inception v3 model act as a feature extractor. The representation vectors of the pre-trained model were obtained, and 23 nodes have replaced the softmax layer of v3 to classify 23 different species of the target images. The softmax layer was trained with the target image and the representation vectors obtained from the pre-model, and a model is built. The model is used to predict the classes of the target images. The architecture of the proposed system is depicted in Fig.1.

Google Inception v3 as Fine Tuning
In Fine-tuning, the top two inception layers of v3 were kept freezing, and the remaining blocks of the model have been trained with fish images obtained from underwater videos. The SoftMax layer, the last layer of Google inception, has been replaced with 23 nodes and trained with target images. During training, we have used categorical loss and RMS prop optimizer with a learning rate of 0.0001.

Google Inception v3 with SVM
The representation vectors were obtained by fine-tuning all the layers with target images by excluding the top two layers. The extracted weights were used to train SVM, where the trained model has been used to classify the fish images during testing. We have used non-linear SVM to handle many features and maximize the margin between different data samples. One-Vs-all classification strategy is used to train the SVM. We have used the LibSVM tool [28] to train and built a multiclass SVM classifier. The architecture of the proposed system is depicted in Fig.2.

Results and Discussion
Fish4knowledge(F4K) dataset [22] is used as the benchmark dataset for fish classification. The dataset consists of 27370 fish images that are obtained from 5824 video clips. These videos are captured using 9 cameras from three different locations under an unconstrained environment. The dataset is annotated with 23 fish types which are more complex and imbalanced. Out of the total images, training and test images are 16430 and 10940, respectively. The intra-class images are different in shape, size, and number of fins. The dataset contains more samples of the most frequent fish class than the infrequent class. The sample images from the dataset are depicted in Fig. 3.
The performance of different transfer learning approaches proposed in this paper were analyzed in terms of accuracy. Accuracy is the ratio of correctly classified fishes and the total number of fishes. We have obtained 79.08% accuracy for the feature extraction method, and for fine-tuning, 90.53% was obtained for the test images. Among these two methods, we have improved accuracy for fine-tuning compared to the feature extraction technique since we have fine-tuned the pre-trained model's top layers. Since the proposed system has learned the basic abstract features of the images from the lower layers, high-level features on the target input images are learned from top layers, leading to high accuracy.  The representation vectors from the training images were obtained through the Google Inception model in the last method. The input images in the dataset are divided into training and test set in the ratio of 60:40. In order to effectively classify the input images, we have fine-tuned the hyperparameters of SVM, namely C and γ, using grid search. The grid search has been performed at two levels. During the first level, a coarse-grained search was made to find the optimal values, and in the next level, the fine-grained search was made locally. We have obtained the C and γ values as 16.0 and 0.03125, respectively. The result of the grid search, along with the accuracies, is depicted in Fig.4.   Fig. 4. Search by Libsvm.

193
International Journal of Advances in Intelligent Informatics ISSN 2442-6571 Vol. 7, No. 2, July 2021, pp. 188-197 We also performed 10-fold cross-validation to validate our training set. We got a training accuracy of 97.9%. The fine-tuned parameters, a multiclass model, were built, and the test images were tested using the model. The input images are classified into 23 different classes. With this method, we have achieved an accuracy of 95.37% for the test set. Classification results obtained for the fish4knowledge dataset for individual species is shown in Table 1. This table shows that the ensemble approach produced improved accuracy because of two levels of training, one using the pre-trained model and the other one using SVM.  Table 2 depicts the confusion matrix obtained from the multiclass SVM classifier. From the confusion matrix, some species like Acanthurus Nigrofuscu, Zanclus Cornutus, Zebrasoma Scopas, Neoglyphidodon Nigrori, Balistapus Undulatus have high misclassifications because of shallow discriminating power and less number of samples.
We have also evaluated precision, recall, and F1-score for Google Inception v3 Model with SVM which is listed in Table 3. Precision is the measure that projects classifier exactness, whereas recall indicates classifier completeness. F1-score measures the balance between precision and recall. With the proposed method, we have achieved a 70.88% of F1-score, which is a commendable result. Also, the proposed method produced 100% precision for most of the classes, which is better than the existing systems. Among the proposed three methods SVM classifier has achieved high accuracy of 95.37%, and the results are shown in Table 4. SVM classifier produced high accuracy compared to the other methods because SVM is capable of selecting appropriate discriminating features for classification from a large number of features. Moreover, in this approach, parameters are learned both in the pre-trained model and SVM training phases.   Google Inception v3 model with SVM.   class  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23   1  4714  35  76  6  4  5  2  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0   2  30  1034  2  2  0  0  4  0  1   We have used three approaches where the first approach is based on the feature extraction, approach two works using a fine-tuning technique, and the last approach used a support vector classifier for classification. The results of our approaches were compared with the results reported in [2][20] [29] [30]. Hasija et al. [14] and Chuang et al. [18] used traditional image processing techniques, while Siddiqui et al. [20] and Salman et al. [2] used deep learning techniques. All the approaches reported their performance on fish species classification in unconstrained environments which is drawn from the Fish4Knowledge dataset. We have used all 23 classes for training and testing purposes, While other approaches listed in Table 4 used only the top 15 species, which contains many samples. Our proposed approaches produced promising results for all the 23 species classification. The SVM classifier of our approach produced better results when compared to all other approaches despite fewer samples for few classes. The SVM classifier acts as a good feature selector that helps select prominent features for each species, which helps SVM produce good accuracy. In our second approach, the hierarchical feature learning capability of the CNN layers present in the Google inception model helps to capture the dependent visual features to produce better classification results. Even though our first approach used the Google Inception model, we obtained less accuracy because it used pre-trained weights of the ImageNet data. Table 5. shows the comparison of our proposed methods with the other methods reported in the literature. Table 5. Comparison with existing works approaches accuracy.

Conclusion
We have proposed three different approaches for classifying fish species using the transfer learning technique. The first method acts as a feature extractor, extracts features using the Google Inception-v3 model, and classifies all the test images using the SoftMax classifier. The second method is based on fine-tuning that tunes all the layers of the Google Inception model using input images except the first two layers. We have achieved an accuracy of 90.53% by this method. The third method, ensembling both Google Inception v3 and SVM classifier, attains an accuracy of 95.37%. We have compared the performance of our approaches with the state of art approaches, and it is observed that our transfer learning-based approach using SVM classifier performs better than existing work. As a feature work, we would like to use a deep learning model to classify the fishes.