Deep learning approaches for MIMO time-series analysis

ABSTRACT


Introduction
Time series analysis involves the examination and prediction of data that is collected sequentially over time. This field of study is crucial in various domains, including finance [1], economics [2], meteorology [3], and sales forecasting [4]. However, time series analysis poses several challenges that need to be addressed to ensure accurate and reliable predictions. These challenges can be categorized into two main areas: single-output forecasting and multi-output forecasting. In single-output forecasting, common . The evaluation criteria for selecting the best-performing methods in this research are based on two performance metrics: Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE). These metrics were chosen for specific reasons related to assessing the accuracy and reliability of the forecasting models. MAPE is used to assess accuracy, while RMSE helps detect outliers in the system. Results show that the LSTM method achieves the best performance, outperforming other methods with an average MAPE value of 8.73% and Bi-LSTM has the best average RMSE value of 0.02216. The findings of this study have practical implications for time-series forecasting in the field of stock trading. The superior performance of LSTM highlights its potential as a reliable method for accurately predicting stock prices. The Bi-LSTM model's ability to detect outliers can aid in identifying abnormal stock market behavior. In summary, this research provides insights into the performance of various DL models of MIMO for stock price forecasting. The results contribute to the field of time-series forecasting and offer valuable guidance for decision-making in stock trading by identifying the most effective methods for predicting stock prices accurately and detecting unusual market behavior.
Multi-Input and Multi-Output (MIMO) time-series forecasting is useful for real-world multivariate data applications. Financial markets use past prices [8], trade volumes [13], market sentiment [14], and macroeconomic data [15] to anticipate stock prices. MIMO time-series forecasting also has other uses. Temperature [16], humidity [17], wind speed [18], and precipitation interact [19] in the weather forecast. MIMO forecasting predicts energy consumption [20], resource availability [21], and production outputs [22] in resource management and optimization. Due to its complexity and real-world applications, the problem of MIMO time-series forecasting has gained attention in recent years [23] [24][25] [26]. The difficulty in this problem lies in modeling the dependencies between the input and output variables, especially when they have different time resolutions and levels of noise. Traditional statistical and machine learning (ML) approaches have limitations in capturing the dynamics and interactions between variables [27] [28]. Therefore, developing a robust and efficient solution for MIMO time-series forecasting for developing robust time series models and obtaining accurate predictions, which have significant implications for decision-making and planning in various fields is crucial.
MIMO Time-series forecasting techniques struggle to represent long-term dependencies [29]. Trends, cycles, and seasonality are long-term dependencies. Various solutions have been proposed to address this problem, including vector autoregression moving average (ARIMA) models [30], multilayer perceptron (MLP) [31], and dynamic Bayesian networks (DBNs) [32] [33]. These methods have shown promising results in capturing the nonlinear dependencies between variables and handling missing values and noisy data. However, they still have limitations in modeling long-term dependencies and dealing with high-dimensional data [34]. Therefore, there is a need for more advanced and flexible models that can overcome these limitations and provide accurate and interpretable predictions. Deep learning (DL) models, such as Convolutional Neural Networks (CNNs) [35][36] [37], Long Short-Term Memory Networks (LSTMs) [38][39] [40], and Gated Recurrent Units (GRU) [41], have been proposed as a promising solution for MIMO time-series forecasting. Therefore, this problem remains an active area of research, and further investigations are needed to develop more efficient and interpretable models. This paper aims to reveal the performance of five different deep learning approaches: CNN, RNN, LSTM, GRU, and Bidirectional LSTM (Bi-LSTM). Recurrent Neural Networks (RNNs), LSTMs, and GRU can simulate long-term dependencies [42]. Recurrent connections and memory cells allow these models to store and learn from prior observations and dependencies. These models can better anticipate long-term trends by adding memory processes. MIMO time-series forecasting uses high-dimensional data to predict multiple outputs from multiple inputs [43]. Computational, model, and dimensionality issues arise with high-dimensional data. CNNs can handle high-dimensional data [44]. CNNs find data patterns by extracting local and global characteristics from input sequences using convolutional algorithms, making suitable for time-series forecasting. These models can capture the complex temporal patterns and interactions between variables and provide accurate and robust predictions. Moreover, they can handle missing values, noisy data, and high-dimensional input and output variables [45] [46]. Developing a DL model for MIMO time-series forecasting requires careful consideration of the model architecture, data preprocessing, and hyperparameter tuning. MLP as baseline of deep learning is used to compare the performance of developed models for stock prediction. This study uses two attributes as target data: the open attribute and the close attribute. The open attribute is the stock price at the beginning of the trade opening, and the close attribute is the price at the end of the period. The open and close prices are beneficial for analyzing pattern trends in stock prices [47]. Both of these attributes are attributes that affect changing the pattern trend (pattern) that is generated for stock predictions. The difference between the open and close prices can provide insights into the intraday price movement, such as whether the stock experienced a positive or negative trend during the trading session [48]. These trends often indicate investor sentiment, market momentum, and trading strategies market participants employ.

Fig. 1. Visualization of Dataset
Incorporating open and close prices as target variables enables the models to learn and capture these patterns, enhancing forecasting accuracy. Considering that factor, the selection of open and close prices as target variables for the forecasting task allows the models to leverage the intraday dynamics and patterns exhibited by these attributes. By incorporating this information into the training process, the models can learn to capture and utilize the trends and fluctuations in stock prices, leading to more accurate and reliable predictions. The comparison of the values between the close and open attributes can be seen in Fig. 2.

Data Preprocessing
Min-max normalization is a common data preprocessing technique used in time-series analysis to scale the data values to a specific range, usually between 0 and 1,s is achieved by subtracting the minimum value from each data point and dividing it by the range between the minimum and maximum values. The resulting values are then scaled to the desired range. Min-max normalization is well-suited for timeseries analysis as it addresses time-dependent data's specific scaling and range normalization requirements [49]. It preserves temporal relationships, handles seasonal and trend components, mitigates the influence of outliers, provides an interpretable scale, and is compatible with various modeling techniques.
The first step is to identify the dataset's minimum and maximum values and apply min-max normalization to time-series data, can be done by iterating through each time step and keeping track of the minimum and maximum values [50]. Once these values have been identified, the data can be normalized using the equation (1).
where is the original data point, _ is the minimum value in the dataset, _ is the maximum value in the dataset, and _ is the normalized data point.
It is important to note that min-max normalization should be applied separately to the training and test datasets. The minimum and maximum values used for normalization should be based on the training dataset only, then applied to the test dataset using the same formula, ensuring that the test dataset is not used to inform the normalization process and prevents data leakage.

MIMO Forecasting
The forecasting model framework in the study can be seen in Fig. 3. Fig. 3 shows that the number of inputs is seven, and the number of outputs is 2. The forecasting process uses various types of methods that have been selected, as shown in Table 2. The first method is Convolutional Neural Networks (CNN), CNN are artificial neural networks that can be used for time-series forecasting. While CNNs were initially developed for image recognition tasks, they have also been applied to sequential data, including time-series data [51]. The main idea behind CNNs is to use filters that convolve over the input data to extract relevant features. These filters are typically small and move across the input data, computing dot products at each location. The outputs from the dot products are then passed through a nonlinear activation function, such as ReLU, and pooled to reduce the dimensionality of the data. This process is repeated multiple times in multiple layers, with each layer capturing more complex features of the input data. In the context of time-series forecasting, CNNs can be used to extract temporal features from sequential data [52]. For example, the filters can convolve over the input time-series data to capture patterns such as trends, cycles, and spikes.

Fig. 3. MIMO Forecasting Scheme
The resulting feature maps can then be fed into fully connected layers to produce a forecast. One advantage of CNNs for time-series forecasting is their ability to capture local and global dependencies in the data. Additionally, CNNs can handle varying input lengths, making them useful for forecasting timeseries data with irregular time intervals. The equation of CNN can be seen in equations (2) to (5).
where X(t) represents the input at time step t, H(t) is the hidden state at time step t, F(t) is flattened, D(t) is a dense layer, and Y(t) is the output. In the case of MIMO, the output is Y(t)1 and Y(t)2, which are depend on each other as in (6) and (7).
The second is Recurrent Neural Networks (RNN). Unlike traditional feedforward neural networks, which have no memory and process each input independently, RNNs have a memory component that allows them to process sequential data [53]. The key idea behind RNNs is the use of recurrent connections between nodes, which allow information to persist over time. In this way, the network can capture temporal dependencies in the data. RNNs use a hidden state that is updated at each time step, and the output at each time step is a function of the current input and the hidden state. The hidden state is passed from one-time step to the next, allowing the network to learn a representation of the entire sequence. In time-series forecasting, RNNs can predict the next value in a time series based on the previous values. The network is trained using a supervised learning algorithm, such as backpropagation through time, where the loss is minimized between the predicted and actual values of the target variable [54]. One advantage of RNNs for time-series forecasting is their ability to capture long-term dependencies in the data, making them useful for predicting trends and cycles. The equation of RNN can be seen in equations (8) to (9).
The output in the case of MIMO is ( )1 and ( )1, which depend on one another as in (10) and (11).
Third, Long Short-Term Memory (LSTM) is a type of RNN commonly used for time-series forecasting. LSTMs were designed to address the limitations of traditional RNNs, such as difficulty in capturing long-term dependencies and vanishing gradients [55]. The key idea behind LSTMs is using a memory cell that can remember information for long periods. Three gates control the memory cell: the input gate, the forget gate, and the output gate. These gates allow the network to update and forget information from the memory cell selectively. In the context of time-series forecasting, LSTMs can predict the next value in a time series based on the previous values [56]. The network is trained using a supervised learning algorithm, such as backpropagation through time, where the loss is minimized between the predicted and actual values of the target variable. One advantage of LSTMs for time-series forecasting is their ability to capture long-term dependencies in the data, making them useful for predicting trends and cycles [57]. Additionally, LSTMs can handle variable-length sequences, making them helpful in forecasting time-series data with irregular time intervals. The equation of LSTM can be seen in equations (12) to (17).
In the MIMO situation, the outputs are the dependent variables Y(t)1 and Y(t)2 as in (18) and (19).
In these equations, C(t), F(t), I(t), O(t) sequentially represents the cell state, forget gate, input gate, and output gate of the LSTM. The variables W and b denote the learnable weights and biases of the model, respectively. The activation functions f and g represent the non-linear activation functions applied to the hidden state and output, respectively. σ represents the sigmoid activation function, and tanh represents the hyperbolic tangent activation function.
The fourth is Bidirectional Long Short-Term Memory (Bi-LSTM) is an extension of the LSTM architecture for time-series forecasting. As the name suggests, Bi-LSTMs involve processing the input sequence in both forward and backward directions [58]. In a standard LSTM, the output at each time step is a function of the current input and the hidden state from the previous time step. In contrast, in a Bi-LSTM, the output at each time step is a function of the current input and the hidden states from both the forward and backward directions. This allows the network to capture dependencies not only from the past, but also from the future. The key advantage of Bi-LSTMs is their ability to capture both past and future dependencies in the data, which can be especially useful for time-series forecasting tasks where future information may be useful for predicting the next value in the sequence [59]. In the context of time-series forecasting, Bi-LSTMs can be used to predict the next value in a time series based on the previous values in both the forward and backward directions. The network is trained using a supervised learning algorithm, such as backpropagation through time, where the loss is minimized between the predicted and actual values of the target variable. (20) to (23).
In the case of MIMO, the results are the dependent variables ( )1 and ( )1 as in (24) and (25), respectively.
where ( ) is forward LSTM, ( ) is backward LSTM, and H(t) is concatenation of forward and backward LSTM.
The last is Gated Recurrent Unit (GRU). GRUs were designed to address the limitations of traditional RNNs, such as difficulty in capturing long-term dependencies and vanishing gradients [60]. GRUs are similar to LSTMs in that they use a memory cell to remember information for long periods of time. However, unlike LSTMs, GRUs use only two gates: the reset and update gates. The reset gate determines how much of the previous memory to forget, while the update gate determines how much of the current input to remember. GRUs can be used to predict the next value in a time series based on the previous values. The network is trained using a supervised learning algorithm, such as backpropagation through time, where the loss is minimized between the predicted and actual values of the target variable. One advantage of GRUs for time-series forecasting is their ability to capture longterm dependencies in the data while requiring fewer parameters than LSTMs [61]. Additionally, GRUs can handle variable-length sequences, making them useful for forecasting time-series data with irregular time intervals.  where R(t) is reset gate and Z(t) is update gate.
These equations capture the MIMO deep learning models' forward pass for time-series forecasting. The models are trained by optimizing the weights and biases to minimize the forecasting error through backpropagation and gradient descent techniques.
According to the background, this paper establishes the method to create an effective deep learningbased for fundamental trading in Bitcoin stock price. Six forecasting methods, including MLP, CNN, RNN, LSTM, Bi-LSTM, and GRU were selected and analyzed to determine the best method based on accuracy in forecasting the future price. Hyperparameter tuning is used to determine the parameter setting values of various existing methods [62]. The hyperparameter tuning method used is random search. The setting parameter for all method can be seen in Table 2.  Table 2 hyperparameter choices were chosen after thoroughly assessing their importance and projected influence on model performance. The following is a complete explanation of these parameters [63]. Hidden layers determine neural network depth and complexity. Although overfitting may grow, the model can capture more intricate interactions with hidden layers. The model's ability to learn and represent complicated patterns depends on hidden layer neuron counts. More neurons can grasp more complex data correlations, although overfitting may rise. The activation function gives the neural network non-linearity to describe complicated input-output interactions. Non-saturation, smoothness, and gradient propagation differ among activation functions. The batch size determines the number of samples processed before updating the model's weights during training. A smaller batch size updates weights more frequently but may be noisy, whereas a bigger batch size may take more memory but update more smoothly. Training epochs determine how many times the model processes the dataset. If not regularised, adding epochs lets the model learn more from the data but risks overfitting. Each of these hyperparameters plays a crucial role in determining the model's performance and behavior.

Evaluation
The evaluation process uses several forecasting methods to determine the performance of the Bitcoin stock price prediction. The evaluation performance uses Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE). MAPE is used to show errors that can represent accuracy. MAPE has several ranges of value-meaning categories, as shown in Table 3 [64]. RMSE is used to detect outliers in the designed system. The equation of MAPE and RMSE can be seen in (32) to (33).

Results and Discussion
The performance comparison of different methods for MIMO time-series forecasting can be seen in Table 4. The MLP method achieved a relatively low MAPE and RMSE for both open and close   The CNN method achieved a MAPE of 10.61802%, an RMSE of 0.09547 for the Open, and a MAPE of 10.75626% and an RMSE of 0.06094 for the Close, indicating that the CNN method performed better than the ARIMA model but outperformed the MLP, RNN, LSTM, and GRU methods. CNN may not perform well on time-series data with long-term dependencies, as the filters may not be able to capture the full range of temporal patterns in the data. However, CNN also requires more training data to produce good MAPE values. In this study, the 1475 data processed using CNN could still not produce optimal values. The performance of the CNN method suggests that it may be useful for some timeseries forecasting tasks but may not be optimal for all applications.
Based on Table 4, the RNN method achieved a MAPE of 9.14382%, an RMSE of 0.04004 for the open price prediction, a MAPE of 8.68090%, and an RMSE of 0.04764 for the close price prediction, this indicates that the RNN method performed relatively well compared to the other methods regarding MAPE and RMSE values, especially for the open price prediction. A lower MAPE and RMSE values indicate that the RNN model was able to make more accurate predictions of the stock prices based on the previous values in the time series. It is important to note that the performance of the RNN method may vary depending on the specific data and problem being addressed. However, RNNs may have difficulty with vanishing gradients, where the gradients become very small, and the network cannot learn long-term dependencies.
The LSTM model achieved a MAPE of 8.79643% for the Open and 8.67726% for the Close. The corresponding RMSE values were 0.05272 for the Open and 0.05097 for the Close. LSTM shows the lowest MAPE for both opening and closing prices. A lower MAPE indicates better performance of the model. In this case, the LSTM model achieved a lower MAPE than all other models except for RNN, indicating its superior performance in predicting stock prices. Overall, the LSTM model demonstrated strong performance in forecasting stock prices based on historical data. Its ability to capture long-term dependencies in the time-series data and remember information for long periods likely contributed to its superior performance. Table 4 shows the Bi-LSTM method achieved a MAPE of 9.62744%, an RMSE of 0.02013 for the Open, and a MAPE of 9.63486% and an RMSE of 0.02419 for the Close. The results suggest that Bi-LSTM performed well compared to the ARIMA method, which had the highest MAPE and RMSE values among all the methods. However, Bi-LSTM's performance was slightly lower than the RNN and International Journal of Advances in Intelligent Informatics ISSN 2442-6571 Vol. 9, No. 2, July 2023, pp. 286-300 LSTM methods, which achieved lower MAPE and RMSE values. The Bi-LSTM shows the lowest RMSE for both open and close prices, indicating its ability to detect outliers in the result predictions. Bi-LSTM can capture a sequence's past and future context at each time step. Overall, the Bi-LSTM method showed promise for time-series forecasting tasks, but its performance may vary depending on the specific data.
The GRU model achieved a MAPE of 9.20339% for open prices and 8.73113% for close prices. Additionally, the model achieved an RMSE of 0.02520 for available prices and 0.03806 for close prices. The GRU model showed good performance in predicting both open and close prices, with a MAPE lower than ARIMA and Bi-LSTM for open prices and lower than ARIMA for close prices. The GRU model is known for capturing long-term dependencies in time-series data, while requiring fewer parameters than LSTMs. The results in the table suggest that the GRU model is effective for predicting stock prices and could be a useful tool for traders and investors. GRU can handle the vanishing gradient problem better than traditional RNNs. MIMO and ensemble are two different approaches in time-series forecasting. In MIMO, multiple time series are used as inputs to the model to predict the values of all the time series simultaneously. This can be useful in situations where multiple related variables influence each other. In this research, the overall attributes could be used to predict open and closed prices. On the other hand, an ensemble involves using multiple models to make predictions and then combining the results of those models to make a final prediction; this can be useful in uncertain situations about the best model to use or where different models have different strengths and weaknesses. For example, an ensemble could include a combination MLP, LSTM, and CNN models, with the final prediction being a weighted average of the predictions made by each model [31][45] [65].
Overall, the result provides insights into the performance of different DL methods for time-series forecasting and highlights the trade-offs between accuracy and complexity of the methods. The MAPE values of all methods at open and close fall into the category of the ability of the forecasting model is very good (<10%). Depending on the specific problem and methods characteristics, one may choose an appropriate method that balances the trade-offs and meets the desired performance criteria.
The results obtained in this study have significant practical implications, particularly in the context of stock trading and investment. Accurate stock price predictions can greatly benefit investors, traders, and financial institutions by providing valuable insights for decision-making. By leveraging MIMO timeseries forecasting models, market participants can gain a competitive edge by identifying potential price movements, trends, and patterns in stock markets, this can help optimize trading strategies, improve risk management, and enhance portfolio performance. Moreover, the ability to accurately forecast stock prices can aid in identifying opportunities for arbitrage, hedging, and market timing, leading to increased profitability and reduced financial risks. Additionally, the performance of different methods in the study may have revealed specific patterns or trends. For example, certain deep learning models with specific architectures or hyperparameters might have better captured complex market dynamics, long-term dependencies, or nonlinear relationships. Identifying such patterns can guide practitioners in selecting appropriate models and techniques for stock price prediction tasks.
In the context of real-time or large-scale applications, it is essential to evaluate the efficiency of the models in processing and forecasting stock price data, includes assessing their ability to handle highdimensional data, adapt to evolving market conditions, and provide timely predictions. Discussing the computational aspects of the models can help stakeholders assess the trade-offs between accuracy and computational efficiency, enabling them to make informed decisions when selecting models for realworld deployment. Future research should consider evaluating the performance of MIMO time-series forecasting models on multiple datasets from different stocks, various market conditions, and diverse periods, this will help establish the robustness and reliability of the proposed methods and provide a more comprehensive understanding of their performance across different contexts. Ultimately, choosing a suitable method for time-series forecasting tasks depends on the specific problem, the desired performance criteria, and the trade-offs between accuracy and complexity. This study provides valuable insights into the performance of different deep learning methods for time-series forecasting and can help guide the selection of appropriate methods for various applications.

Conclusion
In conclusion, this study compared the performance of various deep learning methods, including MLP, CNN, RNN, LSTM, Bi-LSTM, and GRU, in predicting open and close prices for time-series data. Overall, the LSTM method demonstrated the best performance in terms of MAPE, indicating its superior ability to predict stock prices. The Bi-LSTM method achieved the lowest RMSE values, highlighting its effectiveness in detecting prediction outliers. The GRU model also performed well and is known for handling the vanishing gradient problem better than traditional RNNs. While the CNN method performed better than the ARIMA model, it was outperformed by other deep learning methods, suggesting that it may not be optimal for all time-series forecasting tasks. The MIMO and ensemble approaches provide alternative ways to improve forecasting performance by leveraging multiple time series or combining the strengths of different models. The study focused on evaluating deep learning models for MIMO time-series forecasting, but a comprehensive benchmarking and comparative analysis with conventional approaches may have been limited. Future research should evaluate a more extensive range of conventional forecasting methodologies to understand their strengths and drawbacks better. Experimentation is recommended to confirm findings and test these models on different data sets and periods. Declarations Author contribution. The contribution or credit of the author must be stated in this section. Funding statement. The unding agency should be written in full, followed by the grant number in square brackets and year. Conflict of interest. The authors declare no conflict of interest. Additional information. No additional information is available for this paper.