Imputation of missing microclimate data of coffee-pine agroforestry with machine learning

(1) * Heru Nurwarsito Mail (University of Brawijaya, Indonesia)
(2) Didik Suprayogo Mail (University of Brawijaya, Indonesia)
(3) Setyawan Purnomo Sakti Mail (University of Brawijaya, Indonesia)
(4) Cahyo Prayogo Mail (University of Brawijaya, Indonesia)
(5) Novanto Yudistira Mail (University of Brawijaya, Indonesia)
(6) Muhammad Rifqi Fauzi Mail (University of Brawijaya, Indonesia)
(7) Simon Oakley Mail (Lancaster Environment Centre, United Kingdom)
(8) Wayan Firdaus Mahmudy Mail (University of Brawijaya, Indonesia)
*corresponding author

Abstract


This research presents a comprehensive analysis of various imputation methods for addressing missing microclimate data in the context of coffee-pine agroforestry land in UB Forest. Utilizing Big data and Machine learning methods, the research evaluates the effectiveness of imputation missing microclimate data with Interpolation, Shifted Interpolation, K-Nearest Neighbors (KNN), and Linear Regression methods across multiple time frames - 6 hours, daily, weekly, and monthly. The performance of these methods is meticulously assessed using four key evaluation metrics Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). The results indicate that Linear Regression consistently outperforms other methods across all time frames, demonstrating the lowest error rates in terms of MAE, MSE, RMSE, and MAPE. This finding underscores the robustness and precision of Linear Regression in handling the variability inherent in microclimate data within agroforestry systems. The research highlights the critical role of accurate data imputation in agroforestry research and points towards the potential of machine learning techniques in advancing environmental data analysis. The insights gained from this research contribute significantly to the field of environmental science, offering a reliable methodological approach for enhancing the accuracy of microclimate models in agroforestry, thereby facilitating informed decision-making for sustainable ecosystem management.

Keywords


Microclimate Data; Interpolation; Shifted Interpolation; K-Nearest Neighbors (KNN); Linear Regression

   

DOI

https://doi.org/10.26555/ijain.v10i1.1439
      

Article metrics

Abstract views : 354 | PDF views : 97

   

Cite

   

Full Text

Download

References


[1] A. Panozzo et al., “Impact of Olive Trees on the Microclimatic and Edaphic Environment of the Understorey Durum Wheat in an Alley Orchard of the Mediterranean Area,” Agronomy, vol. 12, no. 2, p. 527, Feb. 2022, doi: 10.3390/agronomy12020527.

[2] D. Purnomo, M. Theresia Sri Budiastuti, and D. Setyaningrum, “The role of soybean agroforestry in mitigating climate change in Indonesia,” IOP Conf. Ser. Earth Environ. Sci., vol. 1016, no. 1, p. 012024, Apr. 2022, doi: 10.1088/1755-1315/1016/1/012024.

[3] B. S. Iskandar, J. Iskandar, R. Partasasmita, and R. L. Alfian, “Planting coffee and take care of forest: A case study on coffee cultivation in the forest carried out among people of Palintang, Highland of Bandung, West Java, Indonesia,” Biodiversitas J. Biol. Divers., vol. 19, no. 6, pp. 2183–2195, Oct. 2018, doi: 10.13057/biodiv/d190626.

[4] A. I. de Castro, Y. Shi, J. M. Maja, and J. M. Peña, “UAVs for Vegetation Monitoring: Overview and Recent Scientific Contributions,” Remote Sens., vol. 13, no. 11, p. 2139, May 2021, doi: 10.3390/rs13112139.

[5] E. D. Cahyono et al., “Agroforestry Innovation through Planned Farmer Behavior: Trimming in Pine–Coffee Systems,” Land, vol. 9, no. 10, p. 363, Sep. 2020, doi: 10.3390/land9100363.

[6] H. Douville et al., “Water remains a blind spot in climate change policies,” PLOS Water, vol. 1, no. 12, p. e0000058, Dec. 2022, doi: 10.1371/journal.pwat.0000058.

[7] A. F. S. Pino, Z. Y. D. Espinosa, and E. V. R. Cabrera, “Characterization of the Rhizosphere Bacterial Microbiome and Coffee Bean Fermentation in the Castillo-Tambo and Bourbon Varieties in the Popayán-Colombia Plateau,” BMC Plant Biol., vol. 23, no. 1, p. 217, Apr. 2023, doi: 10.1186/s12870-023-04182-2.

[8] L. Guellouz and F. Khayat, “A data completion method for identifying pollution intrusion in aquifers,” Sci. Rep., vol. 12, no. 1, p. 16200, Sep. 2022, doi: 10.1038/s41598-022-20131-9.

[9] J. N. Cape, R. I. Smith, and D. Leaver, “Missing data in spatiotemporal datasets: the <scp>UK</scp> rainfall chemistry network,” Geosci. Data J., vol. 2, no. 1, pp. 25–30, Jul. 2015, doi: 10.1002/gdj3.24.

[10] K. M. Fouad, M. M. Ismail, A. T. Azar, and M. M. Arafa, “Advanced methods for missing values imputation based on similarity learning,” PeerJ Comput. Sci., vol. 7, p. e619, Jul. 2021, doi: 10.7717/peerj-cs.619.

[11] T. N. Fatyanosa, N. A. Firdausanti, L. F. J. Soto, I. M. dos Santos, P. H. N. Prayoga, and M. Aritsugi, “Conducting Vessel Data Imputation Method Selection Based on Dataset Characteristics,” IOP Conf. Ser. Earth Environ. Sci., vol. 1198, no. 1, p. 012017, Jun. 2023, doi: 10.1088/1755-1315/1198/1/012017.

[12] L. A. Menéndez García et al., “A Method of Pruning and Random Replacing of Known Values for Comparing Missing Data Imputation Models for Incomplete Air Quality Time Series,” Appl. Sci., vol. 12, no. 13, p. 6465, Jun. 2022, doi: 10.3390/app12136465.

[13] H. Voß et al., “HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values,” Nat. Commun., vol. 13, no. 1, p. 3523, Jun. 2022, doi: 10.1038/s41467-022-31007-x.

[14] S. Batra, R. Khurana, M. Z. Khan, W. Boulila, A. Koubaa, and P. Srivastava, “A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records,” Entropy, vol. 24, no. 4, p. 533, Apr. 2022, doi: 10.3390/e24040533.

[15] J. S. Sunny et al., “Anomaly Detection Framework for Wearables Data: A Perspective Review on Data Concepts, Data Analysis Algorithms and Prospects,” Sensors, vol. 22, no. 3, p. 756, Jan. 2022, doi: 10.3390/s22030756.

[16] L. Zhang, “A Pattern-Recognition-Based Ensemble Data Imputation Framework for Sensors from Building Energy Systems,” Sensors, vol. 20, no. 20, p. 5947, Oct. 2020, doi: 10.3390/s20205947.

[17] V. K. R. Chimmula and L. Zhang, “Time series forecasting of COVID-19 transmission in Canada using LSTM networks,” Chaos, Solitons & Fractals, vol. 135, p. 109864, Jun. 2020, doi: 10.1016/j.chaos.2020.109864.

[18] L. Erhan, M. Di Mauro, A. Anjum, O. Bagdasar, W. Song, and A. Liotta, “Embedded Data Imputation for Environmental Intelligent Sensing: A Case Study,” Sensors, vol. 21, no. 23, p. 7774, Nov. 2021, doi: 10.3390/s21237774.

[19] Z. L. Wang, “Triboelectric Nanogenerator (TENG)—Sparking an Energy and Sensor Revolution,” Adv. Energy Mater., vol. 10, no. 17, p. 2000137, May 2020, doi: 10.1002/aenm.202000137.

[20] Z. Liu, C. Peng, T. Work, J.-N. Candau, A. DesRochers, and D. Kneeshaw, “Application of machine-learning methods in forest ecology: recent progress and future challenges,” Environ. Rev., vol. 26, no. 4, pp. 339–350, Dec. 2018, doi: 10.1139/er-2018-0034.

[21] H. Jiang, D. Sun, V. Jampani, M.-H. Yang, E. Learned-Miller, and J. Kautz, “Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 9000–9008, doi: 10.1109/CVPR.2018.00938.

[22] W. Bao, W.-S. Lai, C. Ma, X. Zhang, Z. Gao, and M.-H. Yang, “Depth-Aware Video Frame Interpolation,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, vol. 2019-June, pp. 3698–3707, doi: 10.1109/CVPR.2019.00382.

[23] X. Zhang and X. Wu, “Image Interpolation by Adaptive 2-D Autoregressive Modeling and Soft-Decision Estimation,” IEEE Trans. Image Process., vol. 17, no. 6, pp. 887–896, Jun. 2008, doi: 10.1109/TIP.2008.924279.

[24] F. X. Timmes and F. D. Swesty, “The Accuracy, Consistency, and Speed of an Electron‐Positron Equation of State Based on Table Interpolation of the Helmholtz Free Energy,” Astrophys. J. Suppl. Ser., vol. 126, no. 2, pp. 501–516, Feb. 2000, doi: 10.1086/313304.

[25] T.-L. Cheng, Y.-Y. Lin, X. Lu, and R. Singh, “On Partially Linear Single-Index Models with Missing Response and Error-in-Variable Predictors,” J. Stat. Theory Appl., vol. 18, no. 1, p. 46, Apr. 2019, doi: 10.2991/jsta.d.190306.006.

[26] J. Poulos and R. Valle, “Missing Data Imputation for Supervised Learning,” Appl. Artif. Intell., vol. 32, no. 2, pp. 186–196, Apr. 2018, doi: 10.1080/08839514.2018.1448143.

[27] P. W. Bernhardt, “Model validation and influence diagnostics for regression models with missing covariates,” Stat. Med., vol. 37, no. 8, pp. 1325–1342, Apr. 2018, doi: 10.1002/sim.7584.

[28] A. B. Nassif, D. Ho, and L. F. Capretz, “Towards an early software estimation using log-linear regression and a multilayer perceptron model,” J. Syst. Softw., vol. 86, no. 1, pp. 144–160, Jan. 2013, doi: 10.1016/j.jss.2012.07.050.

[29] H. J. Einhorn, D. N. Kleinmuntz, and B. Kleinmuntz, “Linear regression and process-tracing models of judgment.,” Psychol. Rev., vol. 86, no. 5, pp. 465–485, Sep. 1979, doi: 10.1037/0033-295X.86.5.465.

[30] G. Dudek, “Pattern-based local linear regression models for short-term load forecasting,” Electr. Power Syst. Res., vol. 130, pp. 139–147, Jan. 2016, doi: 10.1016/j.epsr.2015.09.001.

[31] S. Ren et al., “Machine Learning Based Algorithms to Impute PaO 2 from SpO2 Values and Development of an Online Calculator,” Res. Sq., p. 16, Nov. 2021, doi: 10.21203/rs.3.rs-1053360/v1.

[32] L. Weed, R. Lok, D. Chawra, and J. Zeitzer, “The Impact of Missing Data and Imputation Methods on the Analysis of 24-Hour Activity Patterns,” Clocks & Sleep, vol. 4, no. 4, pp. 497–507, Sep. 2022, doi: 10.3390/clockssleep4040039.

[33] M. Friedjungová, M. Jiřina, and D. Vašata, “Missing Features Reconstruction and Its Impact on Classification Accuracy,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11538 LNCS, Springer Verlag, 2019, pp. 207–220, doi: 10.1007/978-3-030-22744-9_16.

[34] Á. Arroyo, Á. Herrero, V. Tricio, E. Corchado, and M. Woźniak, “Neural Models for Imputation of Missing Ozone Data in Air-Quality Datasets,” Complexity, vol. 2018, pp. 1–14, 2018, doi: 10.1155/2018/7238015.

[35] C. Kontos and D. Karlis, “Football analytics based on player tracking data using interpolation techniques for the prediction of missing coordinates,” Stat. Appl. - Ital. J. Appl. Stat., vol. 35, no. 2, p. 19, May 2023. [Online]. Available at: https://www.sa-ijas.org/ojs/index.php/sa-ijas/article/view/202.

[36] H. Späth, Mathematical algorithms for linear regression. Academic Press, pp. 17-192, 1992, doi: 10.1016/B978-0-12-656460-0.50008-2.

[37] P. Saeipourdizaj, P. Sarbakhsh, and A. Gholampour, “Application of imputation methods for missing values of PM 10 and O 3 data: Interpolation, moving average and K-nearest neighbor methods,” Environ. Heal. Eng. Manag., vol. 8, no. 3, pp. 215–226, Sep. 2021, doi: 10.34172/EHEM.2021.25.

[38] Y. Sun, T. Yang, and Z. Liu, “A whale optimization algorithm based on quadratic interpolation for high-dimensional global optimization problems,” Appl. Soft Comput., vol. 85, p. 105744, Dec. 2019, doi: 10.1016/j.asoc.2019.105744.

[39] K. Dashdondov, K. Jo, and M.-H. Kim, “Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration,” J. Korea Converg. Soc., vol. 13, no. 3, pp. 33–41, 2022, [Online]. Available at: https://koreascience.kr/article/JAKO202210459406089.pdf.

[40] Y. Dong, Z. Fu, Y. Peng, Y. Zheng, H. Yan, and X. Li, “Precision fertilization method of field crops based on the Wavelet-BP neural network in China,” J. Clean. Prod., vol. 246, p. 118735, Feb. 2020, doi: 10.1016/j.jclepro.2019.118735.

[41] T. Blu, P. Thevenaz, and M. Unser, “Linear Interpolation Revitalized,” IEEE Trans. Image Process., vol. 13, no. 5, pp. 710–719, May 2004, doi: 10.1109/TIP.2004.826093.

[42] E. Y. Boateng, J. Otoo, and D. A. Abaye, “Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review,” J. Data Anal. Inf. Process., vol. 08, no. 04, pp. 341–357, Sep. 2020, doi: 10.4236/jdaip.2020.84020.

[43] P. Cunningham and S. J. Delany, “k-Nearest Neighbour Classifiers - A Tutorial,” ACM Comput. Surv., vol. 54, no. 6, pp. 1–25, Jul. 2022, doi: 10.1145/3459665.

[44] A. R. Lubis, M. Lubis, and A.- Khowarizmi, “Optimization of distance formula in K-Nearest Neighbor method,” Bull. Electr. Eng. Informatics, vol. 9, no. 1, pp. 326–338, Feb. 2020, doi: 10.11591/eei.v9i1.1464.

[45] L. M. Sinaga, Sawaluddin, and S. Suwilo, “Analysis of classification and Naïve Bayes algorithm k-nearest neighbor in data mining,” IOP Conf. Ser. Mater. Sci. Eng., vol. 725, no. 1, p. 012106, Jan. 2020, doi: 10.1088/1757-899X/725/1/012106.

[46] W. Li, Y. Chen, and Y. Song, “Boosted K-nearest neighbor classifiers based on fuzzy granules,” Knowledge-Based Syst., vol. 195, p. 105606, May 2020, doi: 10.1016/j.knosys.2020.105606.

[47] X. W. Liang, A. P. Jiang, T. Li, Y. Y. Xue, and G. T. Wang, “LR-SMOTE — An improved unbalanced data set oversampling based on K-means and SVM,” Knowledge-Based Syst., vol. 196, p. 105845, May 2020, doi: 10.1016/j.knosys.2020.105845.

[48] K. Taunk, S. De, S. Verma, and A. Swetapadma, “A Brief Review of Nearest Neighbor Algorithm for Learning and Classification,” in 2019 International Conference on Intelligent Computing and Control Systems (ICCS), May 2019, pp. 1255–1260, doi: 10.1109/ICCS45141.2019.9065747.

[49] D. Maulud and A. M. Abdulazeez, “A Review on Linear Regression Comprehensive in Machine Learning,” J. Appl. Sci. Technol. Trends, vol. 1, no. 4, pp. 140–147, 2020, doi: 10.38094/jastt1457.

[50] S. U. Mamatha et al., “Multi-linear regression of triple diffusive convectively heated boundary layer flow with suction and injection: Lie group transformations,” Int. J. Mod. Phys. B, vol. 37, no. 01, Jan, p. 234, 2023, doi: 10.1142/S0217979223500078.

[51] F. Elmaz, Ö. Yücel, and A. Y. Mutlu, “Predictive modeling of biomass gasification with machine learning-based regression methods,” Energy, vol. 191, p. 116541, Jan. 2020, doi: 10.1016/j.energy.2019.116541.

[52] N. Shrestha, “Detecting Multicollinearity in Regression Analysis,” Am. J. Appl. Math. Stat., vol. 8, no. 2, pp. 39–42, Jun. 2020, doi: 10.12691/ajams-8-2-1.

[53] M. Sholeh, E. K. Nurnawati, and U. Lestari, “Penerapan Data Mining dengan Metode Regresi Linear untuk Memprediksi Data Nilai Hasil Ujian Menggunakan RapidMiner,” JISKA (Jurnal Inform. Sunan Kalijaga), vol. 8, no. 1, pp. 10–21, Jan. 2023, doi: 10.14421/jiska.2023.8.1.10-21.

[54] A. Soy Temür and Ş. Yıldız, “Comparison of Forecasting Performance of ARIMA LSTM and HYBRID Models for The Sales Volume Budget of a Manufacturing Enterprise,” Istanbul Bus. Res., vol. 50, no. 1, pp. 15–46, May 2021, doi: 10.26650/ibr.2021.51.0117.

[55] L. Wang, Y. Xia, and Y. Lu, “A Novel Forecasting Approach by the GA-SVR-GRNN Hybrid Deep Learning Algorithm for Oil Future Prices,” Comput. Intell. Neurosci., vol. 2022, pp. 1–12, Aug. 2022, doi: 10.1155/2022/4952215.

[56] Z. Khan, T. Hussain, A. Ullah, S. Rho, M. Lee, and S. Baik, “Towards Efficient Electricity Forecasting in Residential and Commercial Buildings: A Novel Hybrid CNN with a LSTM-AE based Framework,” Sensors, vol. 20, no. 5, p. 1399, Mar. 2020, doi: 10.3390/s20051399.

[57] A. F. Adekoya, I. K. Nti, and B. A. Weyori, “Long Short-Term Memory Network for Predicting Exchange Rate of the Ghanaian Cedi,” FinTech, vol. 1, no. 1, pp. 25–43, Dec. 2021, doi: 10.3390/fintech1010002.

[58] D. Matzke and E.-J. Wagenmakers, “Psychological interpretation of the ex-Gaussian and shifted Wald parameters: A diffusion model analysis,” Psychon. Bull. Rev., vol. 16, no. 5, pp. 798–817, Oct. 2009, doi: 10.3758/PBR.16.5.798.

[59] M.-L. Zhang and Z.-H. Zhou, “ML-KNN: A lazy learning approach to multi-label learning,” Pattern Recognit., vol. 40, no. 7, pp. 2038–2048, Jul. 2007, doi: 10.1016/j.patcog.2006.12.019.

[60] D. Zheng, B. Qin, Y. Li, and A. Tian, “Cloud-Assisted Attribute-Based Data Sharing with Efficient User Revocation in the Internet of Things,” IEEE Wirel. Commun., vol. 27, no. 3, pp. 18–23, Jun. 2020, doi: 10.1109/MWC.001.1900433.

[61] S. Mancini, V. I. Man’ko, and P. Tombesi, “Wigner function and probability distribution for shifted and squeezed quadratures,” Quantum Semiclassical Opt. J. Eur. Opt. Soc. Part B, vol. 7, no. 4, pp. 615–623, Aug. 1995, doi: 10.1088/1355-5111/7/4/016.

[62] B. C. Kelly, “Some Aspects of Measurement Error in Linear Regression of Astronomical Data,” Astrophys. J., vol. 665, no. 2, pp. 1489–1506, Aug. 2007, doi: 10.1086/519947.

[63] N. Hofstra, M. Haylock, M. New, P. Jones, and C. Frei, “Comparison of six methods for the interpolation of daily, European climate data,” J. Geophys. Res. Atmos., vol. 113, no. D21, p. D21110, Nov. 2008, doi: 10.1029/2008JD010100.

[64] W. Sun and F.-J. Chang, “Empowering Greenhouse Cultivation: Dynamic Factors and Machine Learning Unite for Advanced Microclimate Prediction,” Water, vol. 15, no. 20, p. 3548, Oct. 2023, doi: 10.3390/w15203548.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
   andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0