Imputation of missing microclimate data of coffee-pine agroforestry with machine learning

Heru Nurwarsito; Didik Suprayogo; Setyawan Purnomo Sakti; Cahyo Prayogo; Novanto Yudistira; Muhammad Rifqi Fauzi; Simon Oakley; Wayan Firdaus Mahmudy

doi:10.26555/ijain.v10i1.1439


Imputation of missing microclimate data of coffee-pine agroforestry with machine learning

^{(1) *} Heru Nurwarsito

(University of Brawijaya, Indonesia)
⁽²⁾ Didik Suprayogo

(University of Brawijaya, Indonesia)
⁽³⁾ Setyawan Purnomo Sakti

(University of Brawijaya, Indonesia)
⁽⁴⁾ Cahyo Prayogo

(University of Brawijaya, Indonesia)
⁽⁵⁾ Novanto Yudistira

(University of Brawijaya, Indonesia)
⁽⁶⁾ Muhammad Rifqi Fauzi

(University of Brawijaya, Indonesia)
⁽⁷⁾ Simon Oakley

(Lancaster Environment Centre, United Kingdom)
⁽⁸⁾ Wayan Firdaus Mahmudy

(University of Brawijaya, Indonesia)
^*corresponding author

Abstract

This research presents a comprehensive analysis of various imputation methods for addressing missing microclimate data in the context of coffee-pine agroforestry land in UB Forest. Utilizing Big data and Machine learning methods, the research evaluates the effectiveness of imputation missing microclimate data with Interpolation, Shifted Interpolation, K-Nearest Neighbors (KNN), and Linear Regression methods across multiple time frames - 6 hours, daily, weekly, and monthly. The performance of these methods is meticulously assessed using four key evaluation metrics Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). The results indicate that Linear Regression consistently outperforms other methods across all time frames, demonstrating the lowest error rates in terms of MAE, MSE, RMSE, and MAPE. This finding underscores the robustness and precision of Linear Regression in handling the variability inherent in microclimate data within agroforestry systems. The research highlights the critical role of accurate data imputation in agroforestry research and points towards the potential of machine learning techniques in advancing environmental data analysis. The insights gained from this research contribute significantly to the field of environmental science, offering a reliable methodological approach for enhancing the accuracy of microclimate models in agroforestry, thereby facilitating informed decision-making for sustainable ecosystem management.

Keywords

Microclimate Data; Interpolation; Shifted Interpolation; K-Nearest Neighbors (KNN); Linear Regression

DOI

https://doi.org/10.26555/ijain.v10i1.1439

Article metrics

Abstract views : 1552 | PDF views : 336

Cite

How to cite item

Full Text

Download

References

[1] A. Panozzo et al., â€œImpact of Olive Trees on the Microclimatic and Edaphic Environment of the Understorey Durum Wheat in an Alley Orchard of the Mediterranean Area,â€ Agronomy, vol. 12, no. 2, p. 527, Feb. 2022, doi: 10.3390/agronomy12020527.

[2] D. Purnomo, M. Theresia Sri Budiastuti, and D. Setyaningrum, â€œThe role of soybean agroforestry in mitigating climate change in Indonesia,â€ IOP Conf. Ser. Earth Environ. Sci., vol. 1016, no. 1, p. 012024, Apr. 2022, doi: 10.1088/1755-1315/1016/1/012024.

[3] B. S. Iskandar, J. Iskandar, R. Partasasmita, and R. L. Alfian, â€œPlanting coffee and take care of forest: A case study on coffee cultivation in the forest carried out among people of Palintang, Highland of Bandung, West Java, Indonesia,â€ Biodiversitas J. Biol. Divers., vol. 19, no. 6, pp. 2183â€“2195, Oct. 2018, doi: 10.13057/biodiv/d190626.

[4] A. I. de Castro, Y. Shi, J. M. Maja, and J. M. PeÃ±a, â€œUAVs for Vegetation Monitoring: Overview and Recent Scientific Contributions,â€ Remote Sens., vol. 13, no. 11, p. 2139, May 2021, doi: 10.3390/rs13112139.

[5] E. D. Cahyono et al., â€œAgroforestry Innovation through Planned Farmer Behavior: Trimming in Pineâ€“Coffee Systems,â€ Land, vol. 9, no. 10, p. 363, Sep. 2020, doi: 10.3390/land9100363.

[6] H. Douville et al., â€œWater remains a blind spot in climate change policies,â€ PLOS Water, vol. 1, no. 12, p. e0000058, Dec. 2022, doi: 10.1371/journal.pwat.0000058.

[7] A. F. S. Pino, Z. Y. D. Espinosa, and E. V. R. Cabrera, â€œCharacterization of the Rhizosphere Bacterial Microbiome and Coffee Bean Fermentation in the Castillo-Tambo and Bourbon Varieties in the PopayÃ¡n-Colombia Plateau,â€ BMC Plant Biol., vol. 23, no. 1, p. 217, Apr. 2023, doi: 10.1186/s12870-023-04182-2.

[8] L. Guellouz and F. Khayat, â€œA data completion method for identifying pollution intrusion in aquifers,â€ Sci. Rep., vol. 12, no. 1, p. 16200, Sep. 2022, doi: 10.1038/s41598-022-20131-9.

[9] J. N. Cape, R. I. Smith, and D. Leaver, â€œMissing data in spatiotemporal datasets: the <scp>UK</scp> rainfall chemistry network,â€ Geosci. Data J., vol. 2, no. 1, pp. 25â€“30, Jul. 2015, doi: 10.1002/gdj3.24.

[10] K. M. Fouad, M. M. Ismail, A. T. Azar, and M. M. Arafa, â€œAdvanced methods for missing values imputation based on similarity learning,â€ PeerJ Comput. Sci., vol. 7, p. e619, Jul. 2021, doi: 10.7717/peerj-cs.619.

[11] T. N. Fatyanosa, N. A. Firdausanti, L. F. J. Soto, I. M. dos Santos, P. H. N. Prayoga, and M. Aritsugi, â€œConducting Vessel Data Imputation Method Selection Based on Dataset Characteristics,â€ IOP Conf. Ser. Earth Environ. Sci., vol. 1198, no. 1, p. 012017, Jun. 2023, doi: 10.1088/1755-1315/1198/1/012017.

[12] L. A. MenÃ©ndez GarcÃa et al., â€œA Method of Pruning and Random Replacing of Known Values for Comparing Missing Data Imputation Models for Incomplete Air Quality Time Series,â€ Appl. Sci., vol. 12, no. 13, p. 6465, Jun. 2022, doi: 10.3390/app12136465.

[13] H. VoÃŸ et al., â€œHarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values,â€ Nat. Commun., vol. 13, no. 1, p. 3523, Jun. 2022, doi: 10.1038/s41467-022-31007-x.

[14] S. Batra, R. Khurana, M. Z. Khan, W. Boulila, A. Koubaa, and P. Srivastava, â€œA Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records,â€ Entropy, vol. 24, no. 4, p. 533, Apr. 2022, doi: 10.3390/e24040533.

[15] J. S. Sunny et al., â€œAnomaly Detection Framework for Wearables Data: A Perspective Review on Data Concepts, Data Analysis Algorithms and Prospects,â€ Sensors, vol. 22, no. 3, p. 756, Jan. 2022, doi: 10.3390/s22030756.

[16] L. Zhang, â€œA Pattern-Recognition-Based Ensemble Data Imputation Framework for Sensors from Building Energy Systems,â€ Sensors, vol. 20, no. 20, p. 5947, Oct. 2020, doi: 10.3390/s20205947.

[17] V. K. R. Chimmula and L. Zhang, â€œTime series forecasting of COVID-19 transmission in Canada using LSTM networks,â€ Chaos, Solitons & Fractals, vol. 135, p. 109864, Jun. 2020, doi: 10.1016/j.chaos.2020.109864.

[18] L. Erhan, M. Di Mauro, A. Anjum, O. Bagdasar, W. Song, and A. Liotta, â€œEmbedded Data Imputation for Environmental Intelligent Sensing: A Case Study,â€ Sensors, vol. 21, no. 23, p. 7774, Nov. 2021, doi: 10.3390/s21237774.

[19] Z. L. Wang, â€œTriboelectric Nanogenerator (TENG)â€”Sparking an Energy and Sensor Revolution,â€ Adv. Energy Mater., vol. 10, no. 17, p. 2000137, May 2020, doi: 10.1002/aenm.202000137.

[20] Z. Liu, C. Peng, T. Work, J.-N. Candau, A. DesRochers, and D. Kneeshaw, â€œApplication of machine-learning methods in forest ecology: recent progress and future challenges,â€ Environ. Rev., vol. 26, no. 4, pp. 339â€“350, Dec. 2018, doi: 10.1139/er-2018-0034.

[21] H. Jiang, D. Sun, V. Jampani, M.-H. Yang, E. Learned-Miller, and J. Kautz, â€œSuper SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation,â€ in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 9000â€“9008, doi: 10.1109/CVPR.2018.00938.

[22] W. Bao, W.-S. Lai, C. Ma, X. Zhang, Z. Gao, and M.-H. Yang, â€œDepth-Aware Video Frame Interpolation,â€ in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019, vol. 2019-June, pp. 3698â€“3707, doi: 10.1109/CVPR.2019.00382.

[23] X. Zhang and X. Wu, â€œImage Interpolation by Adaptive 2-D Autoregressive Modeling and Soft-Decision Estimation,â€ IEEE Trans. Image Process., vol. 17, no. 6, pp. 887â€“896, Jun. 2008, doi: 10.1109/TIP.2008.924279.

[24] F. X. Timmes and F. D. Swesty, â€œThe Accuracy, Consistency, and Speed of an Electronâ€Positron Equation of State Based on Table Interpolation of the Helmholtz Free Energy,â€ Astrophys. J. Suppl. Ser., vol. 126, no. 2, pp. 501â€“516, Feb. 2000, doi: 10.1086/313304.

[25] T.-L. Cheng, Y.-Y. Lin, X. Lu, and R. Singh, â€œOn Partially Linear Single-Index Models with Missing Response and Error-in-Variable Predictors,â€ J. Stat. Theory Appl., vol. 18, no. 1, p. 46, Apr. 2019, doi: 10.2991/jsta.d.190306.006.

[26] J. Poulos and R. Valle, â€œMissing Data Imputation for Supervised Learning,â€ Appl. Artif. Intell., vol. 32, no. 2, pp. 186â€“196, Apr. 2018, doi: 10.1080/08839514.2018.1448143.

[27] P. W. Bernhardt, â€œModel validation and influence diagnostics for regression models with missing covariates,â€ Stat. Med., vol. 37, no. 8, pp. 1325â€“1342, Apr. 2018, doi: 10.1002/sim.7584.

[28] A. B. Nassif, D. Ho, and L. F. Capretz, â€œTowards an early software estimation using log-linear regression and a multilayer perceptron model,â€ J. Syst. Softw., vol. 86, no. 1, pp. 144â€“160, Jan. 2013, doi: 10.1016/j.jss.2012.07.050.

[29] H. J. Einhorn, D. N. Kleinmuntz, and B. Kleinmuntz, â€œLinear regression and process-tracing models of judgment.,â€ Psychol. Rev., vol. 86, no. 5, pp. 465â€“485, Sep. 1979, doi: 10.1037/0033-295X.86.5.465.

[30] G. Dudek, â€œPattern-based local linear regression models for short-term load forecasting,â€ Electr. Power Syst. Res., vol. 130, pp. 139â€“147, Jan. 2016, doi: 10.1016/j.epsr.2015.09.001.

[31] S. Ren et al., â€œMachine Learning Based Algorithms to Impute PaO 2 from SpO2 Values and Development of an Online Calculator,â€ Res. Sq., p. 16, Nov. 2021, doi: 10.21203/rs.3.rs-1053360/v1.

[32] L. Weed, R. Lok, D. Chawra, and J. Zeitzer, â€œThe Impact of Missing Data and Imputation Methods on the Analysis of 24-Hour Activity Patterns,â€ Clocks & Sleep, vol. 4, no. 4, pp. 497â€“507, Sep. 2022, doi: 10.3390/clockssleep4040039.

[33] M. FriedjungovÃ¡, M. JiÅ™ina, and D. VaÅ¡ata, â€œMissing Features Reconstruction and Its Impact on Classification Accuracy,â€ in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11538 LNCS, Springer Verlag, 2019, pp. 207â€“220, doi: 10.1007/978-3-030-22744-9_16.

[34] Ã. Arroyo, Ã. Herrero, V. Tricio, E. Corchado, and M. WoÅºniak, â€œNeural Models for Imputation of Missing Ozone Data in Air-Quality Datasets,â€ Complexity, vol. 2018, pp. 1â€“14, 2018, doi: 10.1155/2018/7238015.

[35] C. Kontos and D. Karlis, â€œFootball analytics based on player tracking data using interpolation techniques for the prediction of missing coordinates,â€ Stat. Appl. - Ital. J. Appl. Stat., vol. 35, no. 2, p. 19, May 2023. [Online]. Available at: https://www.sa-ijas.org/ojs/index.php/sa-ijas/article/view/202.

[36] H. SpaÌˆth, Mathematical algorithms for linear regression. Academic Press, pp. 17-192, 1992, doi: 10.1016/B978-0-12-656460-0.50008-2.

[37] P. Saeipourdizaj, P. Sarbakhsh, and A. Gholampour, â€œApplication of imputation methods for missing values of PM 10 and O 3 data: Interpolation, moving average and K-nearest neighbor methods,â€ Environ. Heal. Eng. Manag., vol. 8, no. 3, pp. 215â€“226, Sep. 2021, doi: 10.34172/EHEM.2021.25.

[38] Y. Sun, T. Yang, and Z. Liu, â€œA whale optimization algorithm based on quadratic interpolation for high-dimensional global optimization problems,â€ Appl. Soft Comput., vol. 85, p. 105744, Dec. 2019, doi: 10.1016/j.asoc.2019.105744.

[39] K. Dashdondov, K. Jo, and M.-H. Kim, â€œLinear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration,â€ J. Korea Converg. Soc., vol. 13, no. 3, pp. 33â€“41, 2022, [Online]. Available at: https://koreascience.kr/article/JAKO202210459406089.pdf.

[40] Y. Dong, Z. Fu, Y. Peng, Y. Zheng, H. Yan, and X. Li, â€œPrecision fertilization method of field crops based on the Wavelet-BP neural network in China,â€ J. Clean. Prod., vol. 246, p. 118735, Feb. 2020, doi: 10.1016/j.jclepro.2019.118735.

[41] T. Blu, P. Thevenaz, and M. Unser, â€œLinear Interpolation Revitalized,â€ IEEE Trans. Image Process., vol. 13, no. 5, pp. 710â€“719, May 2004, doi: 10.1109/TIP.2004.826093.

[42] E. Y. Boateng, J. Otoo, and D. A. Abaye, â€œBasic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review,â€ J. Data Anal. Inf. Process., vol. 08, no. 04, pp. 341â€“357, Sep. 2020, doi: 10.4236/jdaip.2020.84020.

[43] P. Cunningham and S. J. Delany, â€œk-Nearest Neighbour Classifiers - A Tutorial,â€ ACM Comput. Surv., vol. 54, no. 6, pp. 1â€“25, Jul. 2022, doi: 10.1145/3459665.

[44] A. R. Lubis, M. Lubis, and A.- Khowarizmi, â€œOptimization of distance formula in K-Nearest Neighbor method,â€ Bull. Electr. Eng. Informatics, vol. 9, no. 1, pp. 326â€“338, Feb. 2020, doi: 10.11591/eei.v9i1.1464.

[45] L. M. Sinaga, Sawaluddin, and S. Suwilo, â€œAnalysis of classification and NaÃ¯ve Bayes algorithm k-nearest neighbor in data mining,â€ IOP Conf. Ser. Mater. Sci. Eng., vol. 725, no. 1, p. 012106, Jan. 2020, doi: 10.1088/1757-899X/725/1/012106.

[46] W. Li, Y. Chen, and Y. Song, â€œBoosted K-nearest neighbor classifiers based on fuzzy granules,â€ Knowledge-Based Syst., vol. 195, p. 105606, May 2020, doi: 10.1016/j.knosys.2020.105606.

[47] X. W. Liang, A. P. Jiang, T. Li, Y. Y. Xue, and G. T. Wang, â€œLR-SMOTE â€” An improved unbalanced data set oversampling based on K-means and SVM,â€ Knowledge-Based Syst., vol. 196, p. 105845, May 2020, doi: 10.1016/j.knosys.2020.105845.

[48] K. Taunk, S. De, S. Verma, and A. Swetapadma, â€œA Brief Review of Nearest Neighbor Algorithm for Learning and Classification,â€ in 2019 International Conference on Intelligent Computing and Control Systems (ICCS), May 2019, pp. 1255â€“1260, doi: 10.1109/ICCS45141.2019.9065747.

[49] D. Maulud and A. M. Abdulazeez, â€œA Review on Linear Regression Comprehensive in Machine Learning,â€ J. Appl. Sci. Technol. Trends, vol. 1, no. 4, pp. 140â€“147, 2020, doi: 10.38094/jastt1457.

[50] S. U. Mamatha et al., â€œMulti-linear regression of triple diffusive convectively heated boundary layer flow with suction and injection: Lie group transformations,â€ Int. J. Mod. Phys. B, vol. 37, no. 01, Jan, p. 234, 2023, doi: 10.1142/S0217979223500078.

[51] F. Elmaz, Ã–. YÃ¼cel, and A. Y. Mutlu, â€œPredictive modeling of biomass gasification with machine learning-based regression methods,â€ Energy, vol. 191, p. 116541, Jan. 2020, doi: 10.1016/j.energy.2019.116541.

[52] N. Shrestha, â€œDetecting Multicollinearity in Regression Analysis,â€ Am. J. Appl. Math. Stat., vol. 8, no. 2, pp. 39â€“42, Jun. 2020, doi: 10.12691/ajams-8-2-1.

[53] M. Sholeh, E. K. Nurnawati, and U. Lestari, â€œPenerapan Data Mining dengan Metode Regresi Linear untuk Memprediksi Data Nilai Hasil Ujian Menggunakan RapidMiner,â€ JISKA (Jurnal Inform. Sunan Kalijaga), vol. 8, no. 1, pp. 10â€“21, Jan. 2023, doi: 10.14421/jiska.2023.8.1.10-21.

[54] A. Soy TemÃ¼r and Åž. YÄ±ldÄ±z, â€œComparison of Forecasting Performance of ARIMA LSTM and HYBRID Models for The Sales Volume Budget of a Manufacturing Enterprise,â€ Istanbul Bus. Res., vol. 50, no. 1, pp. 15â€“46, May 2021, doi: 10.26650/ibr.2021.51.0117.

[55] L. Wang, Y. Xia, and Y. Lu, â€œA Novel Forecasting Approach by the GA-SVR-GRNN Hybrid Deep Learning Algorithm for Oil Future Prices,â€ Comput. Intell. Neurosci., vol. 2022, pp. 1â€“12, Aug. 2022, doi: 10.1155/2022/4952215.

[56] Z. Khan, T. Hussain, A. Ullah, S. Rho, M. Lee, and S. Baik, â€œTowards Efficient Electricity Forecasting in Residential and Commercial Buildings: A Novel Hybrid CNN with a LSTM-AE based Framework,â€ Sensors, vol. 20, no. 5, p. 1399, Mar. 2020, doi: 10.3390/s20051399.

[57] A. F. Adekoya, I. K. Nti, and B. A. Weyori, â€œLong Short-Term Memory Network for Predicting Exchange Rate of the Ghanaian Cedi,â€ FinTech, vol. 1, no. 1, pp. 25â€“43, Dec. 2021, doi: 10.3390/fintech1010002.

[58] D. Matzke and E.-J. Wagenmakers, â€œPsychological interpretation of the ex-Gaussian and shifted Wald parameters: A diffusion model analysis,â€ Psychon. Bull. Rev., vol. 16, no. 5, pp. 798â€“817, Oct. 2009, doi: 10.3758/PBR.16.5.798.

[59] M.-L. Zhang and Z.-H. Zhou, â€œML-KNN: A lazy learning approach to multi-label learning,â€ Pattern Recognit., vol. 40, no. 7, pp. 2038â€“2048, Jul. 2007, doi: 10.1016/j.patcog.2006.12.019.

[60] D. Zheng, B. Qin, Y. Li, and A. Tian, â€œCloud-Assisted Attribute-Based Data Sharing with Efficient User Revocation in the Internet of Things,â€ IEEE Wirel. Commun., vol. 27, no. 3, pp. 18â€“23, Jun. 2020, doi: 10.1109/MWC.001.1900433.

[61] S. Mancini, V. I. Manâ€™ko, and P. Tombesi, â€œWigner function and probability distribution for shifted and squeezed quadratures,â€ Quantum Semiclassical Opt. J. Eur. Opt. Soc. Part B, vol. 7, no. 4, pp. 615â€“623, Aug. 1995, doi: 10.1088/1355-5111/7/4/016.

[62] B. C. Kelly, â€œSome Aspects of Measurement Error in Linear Regression of Astronomical Data,â€ Astrophys. J., vol. 665, no. 2, pp. 1489â€“1506, Aug. 2007, doi: 10.1086/519947.

[63] N. Hofstra, M. Haylock, M. New, P. Jones, and C. Frei, â€œComparison of six methods for the interpolation of daily, European climate data,â€ J. Geophys. Res. Atmos., vol. 113, no. D21, p. D21110, Nov. 2008, doi: 10.1029/2008JD010100.

[64] W. Sun and F.-J. Chang, â€œEmpowering Greenhouse Cultivation: Dynamic Factors and Machine Learning Unite for Advanced Microclimate Prediction,â€ Water, vol. 15, no. 20, p. 3548, Oct. 2023, doi: 10.3390/w15203548.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me