Brainwaves feature classification by applying K-Means clustering using single-sensor EEG

.

The use of brainwave signal is a step in the introduction of the individual identity using biometric technology based on characteristics of the body.Brainwave signal has unique characteristics and different on each individual because the brainwave cannot be read or copied by people so it is not possible to have a similarity of one person with another person.To be able to process the identification of individual characteristics, which obtained from the signal brainwave, required a pattern of brain activity that is prominent and constant.Cognitive activity testing using a single-sensor EEG (Electroencephalogram) divided into two categories, called the activity of cognitive involving the ability of the right brain (creativity, imagination, holistic thinking, intuition, arts, rhythms, nonverbal, feelings, visualization, tune of songs, daydreaming) and the left brain (logic, analysis, sequences, linear, mathematics, language, facts, think in words, word of songs, computation) give a different cluster based on two times the test on mathematical activities (no cluster slices of experiment 1 and experiment 2).The result showed that cognitive activity based on math activity can provide a signal characteristic that can be used as the basis for a brain-computer interface applications development by utilizing EEG single-sensor.
the activity of brainwaves required EEG which is designed to acquire, process, and editor of brain signals in the waveform or another form [2]. The increasing availability of wireless devices with a varied number of EEG sensor deployment and possible use of EEG device repeatedly without the risks and the limitations of using non-invasive methods to be attractive to be developed as a medium for self-identity recognition.To recognize the patterns of brain activity is automatically required EEG signal pattern recognition.The selection method of EEG signal pattern recognition precisely specifies the information to be obtained and specific form of EEG signals [14].

II. Related Work
Research on identity recognition using EEG signal much done either in Indonesia and other countries.Klonovs et.al. conducted a study which aims to develop a mobile systems extraction characteristics and classification biometric authentication using EEG.The basis of this study is gaps between mobile web technology and EEG wireless equipment.Brainwaves measurement using the Emotiv Epoc EEG headset used 14 sensors.The research procedure used three tests component, memory based test, visualization test and stimuli based test.The signal is processed and separated from components contaminated by artifacts using blind source separation (BSS) method to get the main components that will be used for feature extraction.The feature extraction in this study using a zero-crossing rate (ZCR), coherence, power spectral density (PSD), cross correlation, wavelet transforms, and latencies.The results showed that there is a difference between different subjects based on the electric potential of biomedical obtained from statistical analysis that is mean and histogram.The most significant methods capable of indicating the typical feature of each individual is zero-crossing rate (ZCR) method, but the result is not so efficient therefor it takes more complex analysis techniques such as power spectral density (PSD) and wavelet analysis.PSD analysis retrieved the similarities in the histogram obtained from spectrogram image.The use of wavelet analysis can be used to measure the potential latency which was resurrected from the visual tests in the occipital lobe area [13].
Hong Bao using blind source separation (BSS) method for mental condition detection.This study aims to separate artifacts or noise from EEG data obtained from the EEG that uses many sensors (multi-channel).Other methods are also used to separate the signal from the artifacts such as event related potential (ERP), analysis of variance (ANOVA), and comparison method using Tukey-kramer.In this study feature extraction conducted by obtaining the average value of the amplitude, the average value of the power frequency, ratio frequency, and the correlation coefficient.As for the classification obtained using linear regression method, cross-validation, rank accuracy, significance, false discovery rate, class size imbalance, and resampling.By providing some tasks on individual the result shows that a classification with an accuracy of 31% for adults, 35% of adults and children together, and 24% for children [15].

A. Data Collection
Data collection is conducted through data acquisition method that retrieves biometrics data from some people and process it into a signal for the following process.Data collection was performed using an EEG NeuroSky Mindset headset with a single sensor.Electrodes are placed in the FP1 position (frontal temporal) in of the skull or frontal lobe position.The process of EEG signals data retrieval process is done gradually and separately with 6 subjects with each subject performed two times data retrieval using a sampling frequency of 128 Hz per second and generate 2,560 data along for 20 seconds.EEG signals are generated from the stimulus in the form of cognitive tasks.There are nine types of cognitive tasks that used in EEG data collection process namely breath color, face, fingers, counting, objects, passthrough, singing, and sports.
In this research, nine types of cognitive task were used to be based on previous research [14], [16]- [18].The whole task is based on psychological perceptions to get the imaginative and cognitive responses of the brain.The nine types of cognitive task are described below, each task in terms of its instructions for the subjects.

Breathing Task (Breath)
Subject closes the eyes and focuses on breathing for 20 seconds without the slightest movement of the limbs.

Object Counting Color Task (Color)
Subjects are given the task of viewing the colors that appear on the screen and the subject is asked to remember the color and reapply the color column on the screen, this task lasts for 20 seconds silently.

Simulated Movement Finger (Finger)
Subjects are asked to close their eyes and focus on simulating as if moving a finger without actually moving a finger for 20 seconds.

Simulated Facial Reconstruction (Face)
The subject reconstructs a person's face in detail for 20 seconds, without making a sound and moving the limbs.

Simulated Object Reconstruction (Object)
Subjects are given time to look at objects in detail then subjects are asked to close their eyes and focus to reconstruct the object in detail for 20 seconds without the slightest move the limbs.

Mathematical Counting Task (Math)
The subject is given a matter of calculating the simple two-numbered multiplication, and solving the problem silently.For example: 15 x 16.The time given is 20 seconds.False and True is ignored in this task.

Simulated Password Recall Task (Pass-thought)
The subject makes a sentence in the form of a combination of letters and numbers and subjects are asked to close their eyes and imagine the sentence for 10 seconds.

Song Recitation Task (Song)
Subjects imagine for 20 seconds a song or sound with lyrics, without actually making a sound and moving the limbs.

Simulated Sport Task (Sport)
The subject chooses a favorable sporting movement then the subject is asked to close his eyes and imagine in the mind of the exercise movement without actually moving the limbs.This task is done for 20 seconds.

B. Signal Processing
In terms of signal processing, biometric data is obtained from the result of the data collection using EEG will be grouped by subject, cognitive task, and time data collection.Furthermore, the data will be feature extraction which aims to get the characteristics differences that represent the main characteristics of the signal.The extraction performed by mean, standard deviation skewness, kurtosis, and entropy.
Mean measure of data distribution, standard deviation measures variations of data distribution, skewness measures the asymmetric level of data distribution, kurtosis measures how flat or high the distribution of data is to a normal distribution, and entropy is used to measure the randomness of the data distribution.
   C. Data Matching Data matching is conducted through K-Means clustering.The results will be grouped into K clusters (Fig. 1).

Fig. 1. K-Means Clustering Flowchart
There are steps stages in performing data matching using the K-Means clustering: a.The first step is to determine the number of clusters to be formed.b.The second step is initialization point of the cluster (centroid) randomly from a set of data to be grouped.
c.The third step is to calculate the distance of any data to each centroid using Euclidean distance.
d.The fourth stage is to choose the shortest distance between each data with the centroid.
e.The fifth step is to determine the position of new centroid by calculating the average of data in the same centroid.
f.The sixth step is to make sure the algorithm converged with data in the new centroid same with the initial centroid.If the two centroids have the same data then the process end.If not, back to step 3.

IV. Results
Preliminary data obtained from the EEG will be extracted to produce characteristics.The results of the feature extraction are the value of the mean, the value of standard deviation, the value of skewness, the value of kurtosis, and the value of entropy.
The results grouped by subject, cognitive task, and time data retrieval.Extraction of characteristic results data will then be performed against the data normalization and divided into 4 clusters using K-Means Clustering.After the specified number of cluster, the next step is to determine the initial focal point centroid randomly.Initialization centroid can be seen in Table 1.The next step is to determine the shortest distance between each data center point of each centroid.Determining the distance using the Euclidean method by selecting the closest distance or minimum value.The data is then grouped and distinguished by a cluster.Steps end when the data on the new centroid equal to data on the old centroid.Data clustering results can be shown in Table 2. Data in the same cluster group can be grouped and summed.The result of grouping data in a cluster shown in Table 3. From the calculation results obtained four cluster groups.The first cluster has a cluster center at point 0.04 with the amount of same data 24 people.The second cluster, there are two people who have same data with a central cluster at the point of 0.87.The third cluster there are 24 people who have same data with a central cluster at the point of 0.12.The fourth cluster has a central cluster at the point of 0.29 with four people.
Tests of cognitive activity using a single-sensor EEG performed in this study is divided into two categories, namely the activity of cognitive involving the ability of the right brain (creativity, imagination, holistic thinking, intuition, arts, rhythms, nonverbal, feelings, visualization, tune of songs, daydreaming) and left brain (logic, analysis, sequencing, linear, mathematics, language, facts, think in words, word of songs, computation).
Based on research that has been conducted by three research subjects in two tests showed that mental activity involving the right brain's cognitive abilities (color, sing, sport).That can be shown in Table 3 and Fig. 2.

Table 2 .
The result of grouping data in the cluster