Emotion brain-computer interface using wavelet and recurrent neural networks

ABSTRACT


Introduction
Brain-Computer Interfaces (BCI) can move output devices accordingly by obtaining brain signals, synthesizing them, and transform them without having to use normal neuromuscular output pathways. It is done by capturing information from the brain using electrical or magnetic signals through an intermediate device such as an Electroencephalogram (EEG). EEG signals can be viewed from several variables, such as focus, emotion, motor imagery, and those variables can be used for external device instructions. By using only one variable of the EEG signal and identified, we can use it as BCI to instruct external devices. Currently, BCI is known to be widely used for video games [1] [2], mouse [3], robots [4] [5], rehabilitation tools [6] [7], as a human helper technology [8] [9] or other external devices. Current techniques in BCIs have attracted many research, studies, industries, professionals, and many levels of societies.
Brain-Computer Interface (BCI) has an intermediate tool that is usually obtained from EEG signal information. This paper proposed the BCI to control a robot simulator based on three emotions for five seconds by extracting a wavelet function in advance with Recurrent Neural Networks (RNN). Emotion is amongst variables of the brain that can be used to move external devices. BCI's success depends on the ability to recognize one person's emotions by extracting their EEG signals. One method to appropriately recognize EEG signals as a moving signal is wavelet transformation. Wavelet extracted EEG signal into theta, alpha, and beta wave, and consider them as the input of the RNN technique. Connectivity between sequences is accomplished with Long Short-Term Memory (LSTM). The study also compared frequency extraction methods using Fast Fourier Transform (FFT). The results showed that by extracting EEG signals using Wavelet transformations, we could achieve a confident accuracy of 100% for the training data and 70.54% of new data. While the same RNN configuration without pre-processing provided 39% accuracy, even adding FFT would only increase it to 52%. Furthermore, by using features of the frequency filter, we can increase its accuracy from 70.54% to 79.3%. These results showed the importance of selecting features because of RNNs concern to sequenced its inputs. The use of emotional variables is still relevant for instructions on BCI-based external devices, which provide an average computing time of merely 0.235 seconds. Emotions are one of the EEG signal variables that need to be generated with proper stimulation. Therefore need an appropriate data collection scenario [10]. Also, emotions are part of the neuropsychological that allows for an increase in the ability of BCI [11]. The component is essential in the interaction between humans and machines. Emotions reflect more spontaneously what is thought than other variables from BCI so that it provided more accurate accuracy [11]. Therefore, emotions allow interaction with the computer so that it can be used as communication [10]. Emotions are included in personal behavior that occurs consciously or unconsciously. Usually, emotions are divided into two areas: positive and negative. Positive emotions consist of happy, interested, surprised, satisfied, relaxed, and enthusiastic. While negative emotions such as sad, angry, upset, scared, shy, disappointed, and anxious. Some studies suggest that emotional linkage is essential from a psychological point of view [12] and neuroscience [13], by comparing optimistic and negative emotions after watching a specific video.
The success of BCI is determined by the accuracy of identifying variables of emotions that are coming from EEG signals, and of the most crucial technique is the use of extraction methods to separate the required information from surrounding noise. When identifying the EEG signal, the thing that determines accuracy is the extraction of pattern distinguishing features. Feature extraction algorithms must be quantified and contributed towards minimizing variations from the same class and maximizing differences from different classes. This stage also reduced the size of information for the classification of electrical activity in the brain. Besides, another variables was also used to move external device such as SSVEP [2][8] [14], code of finger movements [15] concentration [16], brain abnormalities [17], and motor imagery [18] [19]. Nevertheless, out of that many EEG signal variables, emotional identification is most popular. Among these are in the neuropsychological identification of emotions, monitoring of medical rehabilitation [20], and for moving an external device [21].
Finding suitable features is an important element in the use of emotions for classification, such as statistical features [10]. An emotional state is related to the presence of specific waves of EEG signals, such as Theta, Alpha, and Beta waves. Therefore, frequency extraction from the EEG signal becomes useful. Those EEG signal variables can be used to drive several external devices. Therefore, BCI based on emotional variables is more beneficial than other variables, although it needs the right methods of settings and scenarios. This paper proposed an emotion-based BCI to drive robot simulators using frequency extraction and Recurrent Neural Networks (RNN). The action is carried out after identifying one of the three classes of emotions that are "relax", "happy", and "sad" every 10 seconds. We also did frequency extraction comparing two methods of Fast Fourier Transform (FFT) beside wavelet transformation. The second method suitable for non-stationary signals, while for a short duration, we can use FFT.

Brain-Computer Interface
The absence of speech, gestures, or any other muscular activity when trying to move external devices can be solved by using BCI. Usually, the input of BCI gets information of the brain by Electroencephalogram (EEG) recording. Some variables which often uses in BCI, such as motor imagery [19] [22], focus [16], and emotion [21]. BCI's success depends on the ability to recognize these variables. Emotions are one variable of BCI that can affect a relationship with frequency components from EEG signals. This previous research separated two states (happy and sad) using Support Vector Machine from EEG signals that have been extracted using Power Spectral Density with 74% accuracy was done using the same research subject.
EEG signals have irregular shapes, so interpretations require extraction in advance that is related to the variable being reviewed. Emotion as one variable can be divided correlated with the frequency component, although it is limited to certain types of emotions [23]. Researchers have stated several spectral changes and brain areas, correlating with emotional responses. Theta waves are in the range between 4 to 7Hz have strong relations with emotions of sadness or disappointment. Alpha waves are ranging from 8 to 13 Hz, which is a standard definition of relaxing or comfort. Lastly, Beta waves (14-30 Hz) occur when excitation of electric waves; this includes when a person is happy [21]. To a great extent, the efforts mainly were made only on the volatility on EEG spectral power in a fixed frequency band as well as in an extensive range of frequency bands ranging from 4 to 50 Hz [24]. Emotional variables are also used to move objects through the BCI [21]. This way is no need to depend on muscle movements [10]. Also, the recorded signal will have a frequency component of half the sampling frequency. Meanwhile, the processed EEG signal contains specific frequencies depending on the variable being reviewed. So signal processing based on frequency becomes very useful.
BCI is driven by emotional state information captured by EEG signals. The segmented EEG signal is extracted based on the frequency area of the three emotional types in 5-32 Hz using Wavelet. It was carried out to improve the accuracy of identification. The extracted EEG signal decreases from 128 points to 56 per second, given that the wave is processed at a frequency of only 5-32 Hz. The extracted signal is trained first using RNNs to generate generalizations in the form of weights that will be used for realtime emotion identification. The emotional state captured triggered the action performed by a Best-Friend robot simulation, with a model shown in Fig. 1.

Wavelet Extraction
The wavelet transforms extracted signals to specific frequency components, separating signals from unneeded components or noise without having to lose important information. On top of that, this method can also be used for non-stationary signals such as EEG. The output of a particular wavelet is set in time-domain, which allows it to be used as a filter and preprocess stage for us to continue to its next identification block [25]. In principle, Wavelet convolutes the signal with the kernel called the mother wavelet. Various functions are available, one of which is Daubechies, which has an asymmetrical shape such as an EEG signal. One type of wavelet transform is WPD or also known as Wavelet Packet Decomposition. They are linear combinations of its original functions [24]. A wavelet function shown in (1) where j is scale index, k is translation index, n is variable of oscillation, and t is period of the signal.
From Nyquist signal component, the composition starts with (2) as a scaling function and (3) as the mother wavelet.
The higher oscillation parameters of the wavelet function are shown in (4) and (5).
Both h(k) together with g(k) have values that are in relation to scaling function as well as mother wavelet, which also acts as quadrature mirror filters [26]. We define the wavelet packet coefficients as the inner product of wavelet packets formulas using f(t) and setting t as well, as shown in (6).
For original signal S, the earlier sub-branches are acquired using the approximation of h(k), while on the right-hand side, use high pass filter g(k) with the details. Furthermore, the inner product of (6) is changing the value of index scale, translation, and including variables of oscillation. Next, we continue the process by decomposing the signal into j scale level with a tighter range of frequency using (6). This way, divide both groups of low and high frequency. Considering the asymmetric shape, used Symmlet 2 on this research, which consists of four coefficients of a low-pass filter (hn) alongside with gn. The coefficient function scale of a low-pass filter and the high-pass filter can be seen in (7) and (8) [20].   EEG signal has 128 sampling frequency, so there is a 0-128/2 or 0-64 Hz signal recorded. Consideration desired frequency related to three emotion states, i.e., theta, alpha, and beta waves. Wavelet extraction, as shown in Fig. 2.
We conduct five steps to extract those EEG signals into theta waves, approach the span of frequency from 5 to 8 Hz, and to return a value of 8 points. Next is to extract the signal into alpha waves within the span of 9-14 Hz, also using five steps to achieve 12 points. Lastly, Beta wave extraction was done after four steps (15-32Hz) and gave 36 points. In total, over a period of 10 seconds, we acquire 560 points.

Recurrent Neural Networks
Deep learning is neural networks with deep architecture that improved machine learning [5] [27]. These methods make the introduction of available features advanced due to the depth of computing processes. From the beginning of its appearance, the use of deep learning was constrained by the limited ability of computers. But with the development of GPU, this method has become a trend, including for researchers. Deep Learning can increase identification performance with improved and detailed training [24]. Deep learning techniques have shown effective results and used to resolve many challenges and issues in the EEG signal. Some studies used Convolutional Neural Network (CNN), which convolutes with a fixed size kernel. While CNN is mainly used for image processing [28]. But it can also apply one dimensional CNN for BCI of EEG signal, such as in-game control [29]. Another technique to use is Recurrent Neural Networks (RNN), as it is often used for sequential data over a period of time. RNN facilitates the connection of subsequent data, connecting current ones with past time efficiently [30] [31]. LSTM or Long Short Term Memory is used by RNN's standard connection to overcome vanishing gradients' issue. In previous studies, the use of RNN and LSTM could recognize spontaneous emotions (arousal, valence, and liking) from EEG signals resulted in an accuracy rate of 87% [32]. However, other emotions need to be tested, such as sadness, joy, enthusiasm, disappointment, and a relaxed state, which is universal to various stimulations. Fig 3. The architecture has three-layer that are input, hidden, and output. The number of hidden layers is allowed to be more than one. The architecture like Multilayer Perceptron, but have a connection of each neuron of the hidden layer. The learning process on RNNs uses the Backpropagation Through Time (BPTT) algorithm with a function, such as tanh or sigmoid. This architecture is similar to Multilayer Perceptron, but have a connection of each neuron of the hidden layer. The input layer receives sequential data from the feature extraction results. Each neuron is interconnected in the hidden layer via forces called weights. The processes show the time sequence at the current time (t) with a delay at t-1, t-2, ..., n at the interconnections inside its hidden layer. Fig. 3, if rotated 90 counterclockwise, will look like Fig. 4. The initial hidden layer can be used in both previous time (t-1) and current time (t). RNN can connect input with previous and next input, as Fig. 4. So that the method eliminates noise, which brings with the signal. RNNs is not too different from regular Neural Network. RNNs follow some neurons from a similar network in which each neuron sends information to the next neuron, which is connected by weight.

RNN configuration is shown in
The RNNs training process is identical to Neural Network training using a Backpropagation algorithm but with fewer cycles. The parameter that shared equally in every period, so gradient for each output, is strictly dependent on a calculation from the current time step and its previous one. There are three types of gates unit in RNNs, which one of them is LSTM. This research used LSTM gates in an attempt to overcome the dependence of long-term processes, which often comes as a phenomenon that occurred in sequential data processing (Fig. 5).

Fig. 5. LSTM architecture
The key from LSTM is a cell state with architecture marked by a horizontal line that flows from Ct- 1 until Ct. LSTM can delete or complement information to the cell state set with the gate. Gate is a method to pass the information, which consists of biner sigmoid function () and multiplication operation with x. Biner Sigmoid function is an activating function used to data that has an interval between 0 and 1, as shown in (9). The first step in LSTM, which will be disposed of from a cell state called forget gate used (10).
x e x f Where ht-1 and xt values, are in 0-1 interval every cell. State of 1, which represents this information is kept, and 0, which describes this information, is deleted. The second is deciding a recent message stored in the cell. This step is into two parts. First, the Input gate will determine values that will be updated using (11) also, the calculation for recent cell candidates ( The final count is the output gate used to determine which output will be produced. It based on cell state from the results of (13). Moreover, do the calculation with the biner sigmoid function as shown in (14), furthermore, multiplicated with activating function from updated cell state using (15).
The last step is the output gate to determine which output will be produced. It based on cell state from the results of (13); moreover, do the calculation with the biner sigmoid function as shown in (14) also multiplicated with activation function from updated cell state using (15).
The extracted signal using the Wavelet of each segment is 560 points per channel, making it a total of 2.240 data points as inputs from RNNs (since we are using in a combination of four channels). The data points are treated as input neurons, then interconnected inside the hidden layer, while we perform four steps of calculations. The first step is called LSTM, where the ReLU function activates the input data. Then we go to the second step: the dropout layers used to minimize the number of input neurons 0.5 probability that input neuron to the next step is 1120. The third step is LSTM layer 2 with the input dropout layer using (11)- (15). The fourth step is the dense layer using the sigmoid function, where the final result from the previous is entered into (9) to produce a new weight.

Data Acquisition
Data were recorded from 10 students, 15-25 years in good health, and they are all affirmed compliance as a subject for this study. This research used four-channel EEG Emotive Insight channels (AF3, AF4, T7, and T8) with a sampling frequency of 128 Hz.
Each subject is recorded for four minutes, but the data taken was the last two minutes because it is where emotions are triggered, from last-minute music or video stimuli. Each recording was conditioned to bring up specific emotional states for the recording results to provide valid data according to the desired emotional state. The psychological conditions are the condition of relaxed or relax, happy, and sad. The recording process generates as many as 150 data (10 subjects * 3 recordings * 5 trials), which were segmented every 10 seconds. Hence, we got 1800 set data (150 recording x 12 segments)

Results and Discussion
The model was tested with training data and new data or validation data offline. From the 1.800 data sets, 80% or 1.440 are used as training data, and 20% or 360 sets are used as validation data. Experimental outcomes are divided into four sections: optimization of training parameters, the influence of the amount of training data, using Wavelet as extraction, the impact of channel configuration and accuracy testing of training data, and non-training data (validation data).

Optimization of Training Parameters
Before accuracy testing, the model needs to be optimized using the learning rate at the training step, which can influence the accuracy of the identification result. The result of the influence of the learning rate can be seen in Table 1. The test results showed the use of learning rate 0.04 has the highest accuracy of 100% for training data and 70.54% (round up to 71%) for validation data. The validation graph for the learning rate of 0.04 is shown in Fig. 6.

Using Wavelet Extraction
From a learning rate of 0.04, the model tested its accuracy against the use of extraction methods. This study experimented with the correctness of wavelet extraction, Fast Fourier Transform, and without extraction features. The results are shown in Table 2.
Experiments were performed on all three inputs for RNNs. When the input is used in the original signal, it only resulted in 50% accuracy, while the validation data is showing even less than that (around 40%). The accuracy increased when we use wavelet extraction. It becomes 100% against the training data and 70.54% against the validation data. We found that the previous study had accuracy lower than our findings.
Furthermore, the study only corresponded to two emotions and compared them to the same subject while this research examines with any subjects that have never been trained. It used three emotions (happy, sad, and relax). We visualize our result of wavelet extractions in Table 2.

Influence Channel Configuration
Identification of emotional EEG signals may also be influenced by channels that are sources of data retrieval of EEG signals. Test results on each channel can be seen in Table 3. By comparing the results of each channel of AF3, T7, T8, and AF4 gave out two symmetric channels AF3-AF4 and T7-T8 with the same epoch and learning rate. The results showed that for single-channel showed 65.43-100% accuracy of training data and 52-74.79% of validation data. In meanwhile, using a pair of the symmetrical channel gave 78-100% accuracy of training data and 63.41-74.73.11% of new data.

The accuracy of Each Emotion State
We visualize each emotional state to know the best accuracy results of each class. The validation results of the accuracy of each class. The test results of the accuracy of each class can be seen from Fig.  7, and there are 254 data recognized from the 360 data provided. The identification result shows that the Happy class has the highest accuracy of 88%, followed by the Relax class with 84%, and the Sad class of 43%. The test results of the accuracy of each class can be

Wavelet Frequency Filter Configuration
We do the same experiments with the same settings using Wavelet filters of frequencies from 5-32 Hz, which makes wavelet extraction into frequency filters. This setting resulted in an increase of accuracy to 79.3%, as shown in Table 5. This result is closely related to RNN's ability to allow each information to be sequentially connected. When making similar features, the use of RNN is more appropriate than features with different types, given there is a sequence area rather than a later time.

Conclusion
This research provided experiments with their results on BCI of the emotion model using Wavelet and Recurrent Neural Networks. The result showed that the accuracy of the training data is 100% and 71% using validation data. EEG signals are non-stationary; hence, it is necessary to preprocess this BCI research. The Use of Wavelet has proven to increase the accuracy from 39.34 to 70.54% of the respective validation data. While using the same data but the different technique of FFT, has increased the correctness from 39.34 to 52.32%. Research showed that the use of emotional variables could be associated with the dominant frequency component so that wavelet, as a pre-process EEG signal method is very useful. Wavelet gave high accuracy compared to Fast Fourier Transform as a preprocessing stage. More experiment was done in terms of using RNN in this research. Since this technique is sensitive to time-sequential information, the use of features of the frequency filter can increase accuracy from 70.54% to 79.3%. In channel configurations, the use of symmetric channels provides a higher accuracy compared to one channel at a time. From this overall experiments, the identification model of three emotional conditions using wavelet extraction and RNNs can be used as Brain-Computer Interface to drive external devices and involve the symmetric channel pairing of EEG signals.