Mixture gaussian V2 based microscopic movement detection of human spermatozoa

a Department of Air Transportation Management, Aviation Polytechnic of Surabaya, Indonesia b Department of Informatics Engineering, Universitas Pembangunan Nasional “Veteran” Jatim, Indonesia c Department of Computer Engineering, Universitas Maarif Hasyim Latif Sidoarjo, Indonesia 1 rmaryo4u@gmail.com; 2 igsusrama.if@upnjatim.ac.id; 3 moch.hatta@dosen.umaha.ac.id; 4 evapuspaningrum.if@upnjatim.ac.id


Introduction
Infertility can occur in both women and men. One of the factors that cause infertility in men is poor sperm quality, such as low sperm concentration. Medically, a normal sperm concentration is 20 million or more sperm/ml of semen [1]. Sperm test may determine the quality of sperm. An expert uses a microscope to perform the test. As a consequence, assessment is subjectively performed through observations by experts on a set of parameters.
Research on sperm analysis continues to be developed as in research conducted by several researchers. One of them conducts sperm analysis through classification based on the shape of the sperm head by using threshold segmentation and decision trees [2]. Sperm images are from the WHO standard book [1]. The researchers conducted a preprocessing (image adjusting process), segmentation process using the Otsu threshold method, and classification using a decision tree to distinguish between normal and abnormal spermatozoa heads [2]. Similarly, Susrama et al. [3] created the Automated Analysis of Sperm Healthy and superior sperm is the main requirement for a woman to get pregnant. To find out how the quality of sperm is needed several checks. One of them is a sperm analysis test to see the movement of sperm objects, the analysis is observed using a microscope and calculated manually. The first step in analyzing the scheme is detecting and separating sperm objects. This research is detecting and calculating sperm movements in video data. To detect moving sperm, the background processing of sperm video data is essential for the success of the next process. This research aims to apply and compare some background subtraction algorithms to detect and count moving sperm in microscopic videos of sperm fluid, so we get a background subtraction algorithm that is suitable for the case of sperm detection and sperm count. The research methodology begins with the acquisition of sperm video data. Then, preprocessing using a Gaussian filter, background subtraction, morphological operations that produce foreground masks, and compared with moving sperm ground truth images for validation of the detection results of each background subtraction algorithm. It also shows that the system has been able to detect and count moving sperm. The test results show that the MoG (Mixture of Gaussian) V2 (2 Dimension Variable) algorithm has an f-measure value of 0.9449 and has succeeded in extracting sperm shape close to its original form and is superior compared to other methods. To conclude, the sperm analysis process can be done automatically and efficiently in terms of time. Concentration Counters (A2SC2) system using the Otsu and morphological threshold segmentation processes. Other studies [4][5] discussed the analysis of sperm motility by dividing it into three modules: head detection, head tracking, and flagellum tracing. This unique framework aims at providing both the head trajectories and the flagella beat patterns to assess sperm motility quantitatively.
In order to obtain good tracking results, the Background subtraction process is needed. It may separate moving objects (foreground) and background in the video. The background subtraction process can detect sperm motility in the video. One example of sperm detection related to the background subtraction process was conducted by Hidayatullah et al. [6]. They proposed the Gaussian Mixture Model with Hole Filling Algorithm in the sperm detection process, compared to the other six methods for detecting motilities.
The sperm motility measurement in the video required a sperm detection method [7], such as the Adaptive Local Threshold (ALT) and Morphology (M). The method works by separating the background of the object to be detected, removing unwanted objects, and then detecting the ellipse, which is assumed to be the sperm head. The sperm detection methods using Adaptive Local Threshold and Morphology have a detection accuracy of up to 82%. At the same time, Ghasemian et al. [8] focuses on Sperm morphology analysis (SMA) using images of sperm cells, by detecting and analyzing various parts of human sperm. The first step of this research is to eliminate background noise. Then the system recognizes the sperm part of the head, tail and analyzes the size and shape of each part by classifying normal and abnormal. As previously explained, this study only detects images and is different from the research conducted by analyzing spermatozoa videos.
Maggavi et al. [9] and Keskenler et al. [10] implemented the first step of the syllable frame difference algorithm for background subtraction. There were some limitations in selecting the appropriate threshold values [8]. The precision of the output depends significantly on the threshold value selected, used non-linear diffusion filtering in time to eliminate this dependence. Li et al. [11] performed sperm detection based on its contours using the Gaussian Modeling algorithm to segment the foreground. Sperm counts as foreground since it continues to move on video frames in the research data. Furthermore, Khachane et al. [12] divided the background subtraction process based on the method used. The divisions are Basic methods, Fuzzy based method, Statistical method, Type-2 Fuzzy based methods, Statistical methods using color and texture features, Non-parametric methods, Methods based on eigenvalues and eigenvectors, and Neural and neuro-fuzzy methods.
Some of the researched data is image processing so that the process of eliminating noise in the background requires low computational time. In other studies, the processed data is spermatozoa video data or moving sperm data, but video data that is synthetic data and video data that are already available. Video data has a frame rate of 30 fps, and the magnification of video is 20x. The method used for the detection of moving objects uses statistical methods but only uses one or two background subtraction algorithms even though they have quite high accuracy results. In this research, sperm video data from several volunteer scanners were taken in real-time, with a frame rate of 50-60 fps, and data processing for background subtraction using 23 algorithms, both from basic methods and statistical methods. The goal is to get one best algorithm out of 23 algorithms tested. Thus this study will contribute to further research relating to the analysis of sperm infertility levels in determining the right algorithm for sperm detection and calculation.

Method
This research compared moving object detection methods in the case of human sperm detection. Fig. 1 shows the flowchart that consists of two parts. The first part is the process of sperm video production by testing 25 human sperm video data, and the second part is sperm video processing to enable the detection and calculation of human sperms.
There were four processes done in the second part. The first process was preprocessing using a Gaussian filter [13] on every frame read from the sperm video. The next was the process of background subtraction [14]. The result of this process was a binary image, which represented the area of moving objects visible in the frame. A morphological operation consisting of opening operation and closing operation [15] followed the subtraction, eliminated noises, and perfected the shapes of extracted moving sperms. Subtraction algorithm used to validate the detection results of each background. The study compared the foreground mask results of the morphological operations with the ground truth image of moving sperms -the results of manual observation. Moreover, for visualization purposes, any blob area (white objects on the binary images) on the foreground mask would be given bounding boxes on the original frame. Simultaneously, the number of blob objects that were on the foreground mask would be calculated. The result determined that the system had been able to perform detection and calculation of moving sperm.

Materials
The sperm data used was data from one of the patients who performed fertility tests in the integrated laboratory of the Politeknik Kesehatan Surabaya, Indonesia. Following the WHO standard sperm test procedure [1], sperm was retrieved directly from the patient after ejaculation. Then, the sperm that has been liquefied for 10-20 minutes as well and dripped on object-glass, than was observed using a brightfield microscope connected with a Point Gray camera type FL3-U3-13S2C-CS. However, an essential early stage in sperm infertility research is the sperm detection phase or sperm object separation from images/videos obtained from observations on semen. The success rate in separating sperm objects from cement fluids has an essential role for further analysis of sperm objects, as shown in Fig. 2, not all sperm videos taken in real-time yield results cleared of background. Fig. 2(a)-(c) were taken by microscope using the Point Gray type FL3-U3-13S2C-CS camera with 100 X magnification.

Preprocessing (Gaussian filter)
Preprocessing is the original data processing before the sperm video frame data is processed. This process aims to remove noise, clarify data features, minimize/enlarge data size, and convert the original data to obtain data as needed. The preprocessing method used in this study was the Gaussian filter. The filter may increase the image frames of the video blur, yet decreasing the noise, and the detail on the image [16] would affect the results in the next process. Equation (1) expresses one dimension Gaussian function.
where σ is the standard deviation of the distribution that assumed to have 0 mean value. When applied to images, 2D Gaussian distribution is required. Therefore, there were two 1D Gaussian distributions used, each for the x-axis and y-axis [17]. Equation (2) presents the 2D Gaussian distribution.
The amount of kernel used on the Gaussian filter in this research was 5x5, as in Fig. 3.

Background subtraction (BS)
BS detected a foreground mask (a term for a binary image containing information about moving objects in a frame) in a video recording or a camera capture. This technique is often and commonly used in the field of image processing or computer vision. Calculating the difference between the video's current frame value and the background model obtained the foreground mask. According to Sheng-Yi et al. [18], the background subtraction process outline can be divided based on the method used. The divisions are Basic methods, Fuzzy based methods, Statistical methods, Non-parametrical methods, a method based on eigenvalues and eigenvectors, Neural and neuro-fuzzy methods. The ones used in this research were the Basic Method and the Statistical Method [19].
In a study of sperm infertility, the sperm that will eventually fertilize the egg is the sperm that keeps moving. Therefore, the background subtraction process is necessary for detecting moving sperm. In the case of sperm detection, the advantages of the background subtraction process are the data used has unimodal characteristics, the distance between frames is short, and the effect of light changes is absent. On the other hand, the challenge faced is the presence of background objects that move and the existence of new objects is considered background. The input from the background subtraction process is preprocessed video frames, while the output is binary images that represent the objects (sperm) that move in the video.

Basic background subtraction models
The basic model performs mathematical calculations in conducting background image modeling, which calculates the value differences between frames, obtains the background model by calculating the average of the frame history, and calculates the variance of the frame history. The basic model algorithms used in this research were weighted moving means (WMM), weighted moving (WM), and variance frame difference (FD) [20]. The difference calculation between the background image and the current frame of the video obtained the pixel value, which was classified as foreground. We used a thresholding process to convert the foreground image into a binary image. As a result, the pixels representing the moving objects would have a value of 1, while the regions' background would have a value of 0. The primary method algorithms used in this research were WMM, WM, and variance FD. Frame Difference is expressed by (3) and (4) [21].
where the frame (f) means a moment before the current frame, background (B), and foreground (F).
Weighted Moving Variance modeled the background image by counting the moving variance [22]. In calculating the average value (μ), this algorithm uses the formula of the weighted moving mean algorithm. Equation (6)

 
Adaptive Background Learning involves subsequent sequential frames to get motion differences from the image and forecast the foreground and background areas [23]. Two classes are used for segmentation: class 1 as background and class 2 to note the foreground point. It aims to provide background images regularly based on steady motion and image segmentation.

Statistical background subtraction models
The statistical models appropriate in addressing the situation are camera automatic adjustment, moved background object, and inserted background object. This situation corresponds to the challenges present in detecting moving sperms. The background subtraction algorithms in this model statistically modeled every pixel in the frame to be then classified into the categories of foreground pixels background pixels. The Single Gaussian are represented as in (7), (8), and (9) [24]. The Gaussian Mixture (GMM) is a type of density model consisting of components of Gaussian functions [25]- [27]. For each pixel, { 1 , . . . , }, is modeled with a Gaussian distribution mixture. As for taking the probability value in each pixel, it is obtained through (10), (11), and (12).
The mathematical model for selecting the first distributed background (B) is defined as in (12). is the number of distributions, ω i,t is the estimated weight of the Gaussian mixture at the time t, , is the mean value of the Gaussian mixture at the time t , ∑ , is a covariance matrix to Gaussian-mixture at time t, η is the Probability Density Function, |∑ , | is the determinant of covariance, the power is a transpose matrix, the power of -1 is an inverse matrix, is exponential, and (phi) is a scalar image size , as well as vector image (RGB) Kernel Density Estimation (KDE) is a non-parametric statistical approach to estimating the probability distribution function of a random variable if it is assumed the shape or distribution model of the random variable is unknown [28]. KDE is defined as in (13).

  
The foreground is detected using the following rules: If ( ) < , then the pixel includes the foreground. Also, pixels are included in the background. This algorithm is the same as GMM, able to adapt to the multimodal background, but it is not necessary to estimate the parameters of Gaussian. A mixture of Gaussians (MoG) [29][30] is a distribution that has a pdf (probability density function) of the kind with (14) and (15).
where ( | , ) is the pdf of a Gaussian with parameters , , where the mixture coefficients satisfy [30].

Morphology operation
After the background subtraction process was done, then the foreground mask was obtained, which is in the form of a binary image, representing the pixels moving. The resulting foreground image still had noises, and sometimes the moving sperm objects extracted were not intact; they were divided into two or more parts [31]. For this reason, this research applies morphological operation that performed in this study was the opening and closing operation.
The opening operation occurs morphological erosion operation followed by dilated morphology operation. The erosion process aimed to eliminate noise that appeared in the image foreground from the background subtraction process. Then, the dilation process was performed to restore the shape of the changed object after the first erosion process. The closing operation process is morphological dilation operation followed by erosion morphology operation. The dilation process may close small holes in image sperm and connect separate sperm shapes. The last in-depth erosion process in the closing operation refines the shapes of detected moving sperms. The elemental structure used in morphological operations performed has an ellipse shape with a kernel 5x5. The shape of the elemental structure can be seen in Fig. 4.

Ground truth creation and image
A ground truth image in this research is an image that contains information about the real area of the moving sperm object in the frame of sperm data video. The ground truth images were obtained by manually observing the regions on video frames that have moving sperm objects.
For sperm movement verification, ten frames before and ten frames after the frame were observed, from which aground truth image would be drawn. For example, the ground truth image in the 50th frame was created by observing sperm movement from frame 40 to frame 60 of the video. The area where there was a moving sperm object was marked by assigning a 255 (white) pixel value, and an area that does not have a moving sperm object was characterized by giving a 0 (black) pixel value. In this manner, a ground truth image is formed, which will refer to the testing process of the results of the detection and calculation of sperm. Fig. 5 illustrates the ground truth image on frame 50.

Contour detection and validation
In the previous process, the morphological operation had successfully eliminated noise, which arose due to the foreground separation process from the background and refinement of the extracted sperm form, obtaining a binary image, which represents moving sperms. Any blob on the binary image is a representation of a moving sperm object. From the resulting image of the morphological process, the sperm object is detected based on its contour, so the contour shape, the contour area, and the midpoint of the sperm position on the frame can be known. Based on this information, any detected sperm will be given a bounding box and count in the original frame of the video, so it can be observed that the system succeeded in detecting sperm.
The validation process was conducted by comparing the results of moving sperms from each algorithm tried with ground truth images, which were the result of the manual observation of the location of sperm location moving in the video frame. The comparison result was then analyzed using the receiver-operating characteristic, so the level of validation of each algorithm tried can be known.
The detection result of each algorithm used will be compared with the ground truth image, obtaining three values, namely False Positive (FP), True Positive (TP), and False Negative (FN). True positive is for an existed sperm that was detected as existing. False Positive (FP) indicates an existing sperm that was undetected. False Negative (FN) for a sperm that did not exist, but was detected. From the above test results, the precision, recall, and f-measure values were calculated, with equations (16), (17) and (18) [32].

Results and Discussion
The testing results shown were the results of preprocessing, background subtraction, morphological operation, and sperm count detection and calculation. The testing compares sperm detection results and calculation of sperm with ground truth images of moving sperm objects. The comparison was then analyzed using receiver operating characteristics.

Preprocessing results
The preprocessing is a Gaussian filter with a kernel size of 5x5. The inputs of preprocessing were video frames of the observed semen liquid. Fig 6(a) presents an example of the frame of the sperm video data used while the result of preprocessing is in Fig. 6(b). The preprocessing eliminated white noises that appeared from the camera, blurring the images, and reducing the detail of the image.

Ground truth image results
The ground truth images were obtained by manually observing the regions in video frames that have moving sperm objects. These ground truth images would be used as a comparison of the results of the detection and calculation of sperms. The observation of ten frames before and 10 frames after the frame may ensure the movement, from which aground truth image would be drawn. For example, the ground truth image in the 50 th frame was created by observing sperm movement from frame 40 to frame 60 of the video. An example of an original frame and its ground truth image in frame 60 and 90 can be seen in Fig. 7. An example of background subtraction result from Fig. 8 (Frame Difference) and Fig. 9 (Grimson GMM). The red box in the resulted image is an area where there are background objects which are moving in the video scene, and the yellow box is an example of an area where there is a detected moving sperm object. In the yellow box, the detected sperm looks divided into several parts, including noises around the detected sperm.
This also happens to all detected moving sperm, as seen in the foreground mask picture in Fig. 8 and Fig. 9. One detectable moving sperm object can appear divided into several parts. Between the sperm head and tail, the area of the sperm head was divided into two. Noises also appear in other areas on the 218 Vol. 6, No. 2, July 2020, pp.  frame, not just around the moving sperm region. In the red box, it can be seen that a moving background object was also detected. The background object detected was also divided into sections like the moving sperm object. However, it was the background object movement that was detected.

Results of morphological operation
The Morphological Operation conducted consisted of an Opening Operation followed by a Closing Operation, or successively erosion-dilation-dilation-erosion. The opening operation aimed to eliminate noises, which appeared on the foreground image of the background subtraction results, and returned the morphed object shape after the noise removal process (erosion). The Closing operation aimed to cover small holes in objects, connect separate sperm forms, and perfect the shape of the detected moving sperm.
The input of the morphological operation process was foreground mask resulting from background subtraction, where the image has noises. Sometimes, the moving sperm object is not intact divided and was into two parts or more. Moreover, results from the morphological operation were a foreground mask in which the noises had been removed. The detected sperm object is intact so that each BLOB (large binary object) that exists became a representation of a moving sperm object on the video frame. Any background subtraction algorithm produced varying foreground mask, and the results of each morphological operation of each background subtraction algorithm is explained in the next sections.

Sperm detection and calculation of the test
After the process of morphological operation, with an assumption of the foreground mask that was already formed, it did not have noise, and severed sperm objects had been reconnected. Any blob (Binary Large Object) that was on the foreground mask became a representation of a moving sperm object. For visualization purposes, each blob that was formed would be detected based on its contour so that its contour shape, total contours, and the location of the midpoint of each moving sperm object can be known. From this information, the bounding box and the sequence of sperm moves detected on the original frame from the video were given, so it can be observed that the system has succeeded in performing the detection and calculation of sperms. The details of the results of sperm detection and calculation and its test results can be seen in Table 1. The testing of detection and calculation of sperms would be done by manually comparing the results of detection and calculation of sperm with images ground truth of moving sperm objects, which was the result of manual observation. This comparison was made 10 times by taking the detection results on every 30 frames of the video, thus forming the sequence of frames: 30, 60, 90, 120, 150, 180, 210, 240, 270, and 300. The comparison results were then analyzed using ROC Analysis, obtaining the value: True positive (TP) for existing sperms being detected. False Positive (FP) for a real sperm that was undetected. False Negative (FN) for a sperm that did not exist, but was detected. After the values are obtained, then the f-measure, recall, and precision values of each algorithm used were calculated, so it can be determined which background subtraction algorithm is the most appropriate in detecting and calculating moving sperms in a video.
The Basic algorithm test model to detect and calculate sperm count includes the Frame Difference, Adaptive Background Learning, Weighted Moving Mean, Fuzzy Choquet, Integral, Weighted Moving Average, and Wren Gaussian Average algorithms. The test results of the six basic model algorithms, the Adaptive Background Learning algorithm has the highest accuracy, followed by the Frame Difference and Weighted Moving Average (Table 1). Meanwhile, Fuzzy Choquet Integral cannot be used for detection analysis and calculating sperm count because of its accuracy. The statistical models' test shows that the highest accuracy is achieved by the MoG V2 algorithm, while MoG V1 has the lowest accuracy. If both the best models, Adaptive Background Learning and MoG V2 algorithms, were compared, MoG V2 has the highest f-measure of 0.9449 compared to Adaptive Background Learning with 0.9205.

Conclusions
In this research, we have implemented a series of procedures for double-tracking spermatozoa motility based on the basic model and statistical model algorithms and created a complete integrated system from data collection to analysis. The MOG V2 algorithm in the background subtraction process was capable of detecting and performing calculations of moving sperm objects in the video. The algorithm is the resulting foreground with little noise, background object moving in the video being undetected as foreground objects, and sperm forms being extracted more perfectly. The test results of performing detection and calculation of sperm motility show that MOG V2 has the highest f-measure value of 0.9449 compared to the other tested background subtraction algorithms in this research. Another best performance algorithm is the Adaptive Background Learning algorithm that has an f-measure value of 0.9205. It shows that the difference value between the MOG V2 and Adaptive Background Learning is only 0.0244. It means that the MOG V2 and Adaptive Background Learning algorithms are promising for detecting and calculating sperm motility. Both algorithms have successfully overcome the challenges and advantages that exist in this case.