Analyzing computer vision models for detecting customers: a practical experience in a mexican retail

(1) * Alvaro Fernández Del Carpio Mail (Department of Software Engineering Universidad La Salle, Arequipa, Peru)
*corresponding author

Abstract


Computer vision has become an important technology for obtaining meaningful data from visual content and providing valuable information for enhancing security controls, marketing, and logistic strategies in diverse industrial and business sectors. The retail sector constitutes an important part of the worldwide economy. Analyzing customer data and shopping behaviors has become essential to deliver the right products to customers, maximize profits, and increase competitiveness. In-person shopping is still a predominant form of retail despite the appearance of online retail outlets. As such, in-person retail is adopting computer vision models to monitor store products and customers. This research paper presents the development of a computer vision solution by Lytica Company to detect customers in Steren’s physical retail stores in Mexico. Current computer vision models such as SSD Mobilenet V2, YOLO-FastestV2, YOLOv5, and YOLOXn were analyzed to find the most accurate system according to the conditions and characteristics of the available devices. Some of the challenges addressed during the analysis of videos were obstruction and proximity of the customers, lighting conditions, position and distance of the camera concerning the customer when entering the store, image quality, and scalability of the process. Models were evaluated with the F1-score metric: 0.64 with YOLO FastestV2, 0.74 with SSD Mobilenetv2, 0.86 with YOLOv5n, 0.86 with YOLOv5xs, and 0.74 with YOLOXn. Although YOLOv5 achieved the best performance, YOLOXn presented the best balance between performance and FPS (frames per second) rate, considering the limited hardware and computing power conditions.

Keywords


Computer Vision; Object Detection; Physical Retail; YOLO; SSD Mobilenet

   

DOI

https://doi.org/10.26555/ijain.v10i1.1112
      

Article metrics

Abstract views : 523 | PDF views : 135

   

Cite

   

Full Text

Download

References


[1] M. Javaid, A. Haleem, R. P. Singh, S. Rab, and R. Suman, “Exploring impact and features of machine vision for progressive industry 4.0 culture,” Sensors Int., vol. 3, p. 100132, Jan. 2022, doi: 10.1016/j.sintl.2021.100132.

[2] T. Habuza et al., “AI applications in robotics, diagnostic image analysis and precision medicine: Current limitations, future trends, guidelines on CAD systems for medicine,” Informatics Med. Unlocked, vol. 24, p. 100596, Jan. 2021, doi: 10.1016/j.imu.2021.100596.

[3] V. Kakani, V. H. Nguyen, B. P. Kumar, H. Kim, and V. R. Pasupuleti, “A critical review on computer vision and artificial intelligence in food industry,” J. Agric. Food Res., vol. 2, p. 100033, Dec. 2020, doi: 10.1016/j.jafr.2020.100033.

[4] D. Atkin, B. Faber, and M. Gonzalez-Navarro, “Retail Globalization and Household Welfare: Evidence from Mexico,” J. Polit. Econ., vol. 126, no. 1, pp. 1–73, Feb. 2018, doi: 10.1086/695476.

[5] M. Naidoo and A. Gasparatos, “Corporate environmental sustainability in the retail sector: Drivers, strategies and performance measurement,” J. Clean. Prod., vol. 203, pp. 125–142, Dec. 2018, doi: 10.1016/j.jclepro.2018.08.253.

[6] A. Parfenov, L. Shamina, J. Niu, and V. Yadykin, “Transformation of Distribution Logistics Management in the Digitalization of the Economy,” J. Open Innov. Technol. Mark. Complex., vol. 7, no. 1, p. 58, Mar. 2021, doi: 10.3390/joitmc7010058.

[7] C.-C. Lin, K. N. Ramamurthy, and S. U. Pankanti, “Moving Camera Analytics: Computer Vision Applications,” in Embedded, Cyber-Physical, and IoT Systems, Cham: Springer International Publishing, pp. 89–113, 2020, doi: 10.1007/978-3-030-16949-7_5.

[8] X. Ding, C. Chen, C. Li, and A. Lim, “Product demand estimation for vending machines using video surveillance data: A group-lasso method,” Transp. Res. Part E Logist. Transp. Rev., vol. 150, p. 102335, Jun. 2021, doi: 10.1016/j.tre.2021.102335.

[9] A. Milella, A. Petitti, R. Marani, G. Cicirelli, and T. D’orazio, “Towards Intelligent Retail: Automated on-Shelf Availability Estimation Using a Depth Camera,” IEEE Access, vol. 8, pp. 19353–19363, 2020, doi: 10.1109/ACCESS.2020.2968175.

[10] E. Pantano, “Non-verbal evaluation of retail service encounters through consumers’ facial expressions,” Comput. Human Behav., vol. 111, p. 106448, Oct. 2020, doi: 10.1016/j.chb.2020.106448.

[11] A. Tonioni and L. Di Stefano, “Domain invariant hierarchical embedding for grocery products recognition,” Comput. Vis. Image Underst., vol. 182, pp. 81–92, May 2019, doi: 10.1016/j.cviu.2019.03.005.

[12] E. P. Ijjina, G. Kanahasabai, and A. S. Joshi, “Deep Learning based approach to detect Customer Age, Gender and Expression in Surveillance Video,” in 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Jul. 2020, pp. 1–6, doi: 10.1109/ICCCNT49239.2020.9225459.

[13] Y. Jiang, D. Pang, and C. Li, “A deep learning approach for fast detection and classification of concrete damage,” Autom. Constr., vol. 128, p. 103785, Aug. 2021, doi: 10.1016/j.autcon.2021.103785.

[14] Lytica, “About.”. [Online]. Available at: https://lytica.com/about/.

[15] N. Eriksson, C.-J. Rosenbröijer, and A. Fagerstrøm, “Smartphones as decision support in retail stores – The role of product category and gender,” Procedia Comput. Sci., vol. 138, pp. 508–515, Jan. 2018, doi: 10.1016/j.procs.2018.10.070.

[16] P. Pandey, H. Gajjar, and B. J. Shah, “Determining optimal workforce size and schedule at the retail store considering overstaffing and understaffing costs,” Comput. Ind. Eng., vol. 161, p. 107656, Nov. 2021, doi: 10.1016/j.cie.2021.107656.

[17] D. Grewal, A. L. Roggeveen, and J. Nordfält, “The Future of Retailing,” J. Retail., vol. 93, no. 1, pp. 1–6, Mar. 2017, doi: 10.1016/j.jretai.2016.12.008.

[18] A. Damen, “53 Data-Backed Retail Statistics Shaping Retail and Beyond,” 2022. [Online]. Available at: https://www.shopify.com/retail/retail-statistics.

[19] McKinsey, “A new playbook for retail leaders,” 2023. [Online]. Available at: https://www.mckinsey.com/industries/retail/our-insights/retail-reset-a-new-playbook-for-retail-leaders.

[20] M. R. Pinto, P. K. Salume, M. W. Barbosa, and P. R. de Sousa, “The path to digital maturity: A cluster analysis of the retail industry in an emerging economy,” Technol. Soc., vol. 72, p. 102191, Feb. 2023, doi: 10.1016/j.techsoc.2022.102191.

[21] P. Sakrabani, A. P. Teoh, and A. Amran, “Strategic impact of retail 4.0 on retailers’ performance in Malaysia,” Strateg. Dir., vol. 35, no. 11, pp. 1–3, Nov. 2019, doi: 10.1108/SD-05-2019-0099.

[22] L. L. Har, U. K. Rashid, L. Te Chuan, S. C. Sen, and L. Y. Xia, “Revolution of Retail Industry: From Perspective of Retail 1.0 to 4.0,” Procedia Comput. Sci., vol. 200, pp. 1615–1625, Jan. 2022, doi: 10.1016/j.procs.2022.01.362.

[23] J. S. Raj, “A Comprehensive Survey On The Computational Intelligence Techniques And Its Applications,” J. ISMAC, vol. 01, no. 03, pp. 147–159, Dec. 2019, doi: 10.36548/jismac.2019.3.002.

[24] F. Alsakka, I. El-Chami, H. Yu, and M. Al-Hussein, “Computer vision-based process time data acquisition for offsite construction,” Autom. Constr., vol. 149, p. 104803, May 2023, doi: 10.1016/j.autcon.2023.104803.

[25] M. Helmy, T. T. Truong, E. Jul, and P. Ferreira, “Deep learning and computer vision techniques for microcirculation analysis: A review,” Patterns, vol. 4, no. 1, p. 100641, Jan. 2023, doi: 10.1016/j.patter.2022.100641.

[26] K. Das and A. K. Baruah, “Object Detection on Scene Images: A Novel Approach,” Procedia Comput. Sci., vol. 218, pp. 153–163, Jan. 2023, doi: 10.1016/j.procs.2022.12.411.

[27] W. Mrabti, K. Baibai, B. Bellach, R. O. Haj Thami, and H. Tairi, “Human motion tracking: A comparative study,” Procedia Comput. Sci., vol. 148, pp. 145–153, Jan. 2019, doi: 10.1016/j.procs.2019.01.018.

[28] P. Wu, W. Li, and M. Yan, “3D scene reconstruction based on improved ICP algorithm,” Microprocess. Microsyst., vol. 75, p. 103064, Jun. 2020, doi: 10.1016/j.micpro.2020.103064.

[29] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018, doi: 10.1109/TPAMI.2017.2699184.

[30] P. Kaur, B. S. Khehra, and A. P. S. Pharwaha, “Color Image Enhancement based on Gamma Encoding and Histogram Equalization,” Mater. Today Proc., vol. 46, pp. 4025–4030, Jan. 2021, doi: 10.1016/j.matpr.2021.02.543.

[31] J. Xiao, X. Cui, and F. Li, “Human action recognition based on convolutional neural network and spatial pyramid representation,” J. Vis. Commun. Image Represent., vol. 71, p. 102722, Aug. 2020, doi: 10.1016/j.jvcir.2019.102722.

[32] J. Wang et al., “Deep 3D human pose estimation: A review,” Comput. Vis. Image Underst., vol. 210, p. 103225, Sep. 2021, doi: 10.1016/j.cviu.2021.103225.

[33] S. Davanthapuram, X. Yu, and J. Saniie, “Visually Impaired Indoor Navigation using YOLO Based Object Recognition, Monocular Depth Estimation and Binaural Sounds,” in 2021 IEEE International Conference on Electro Information Technology (EIT), May 2021, vol. 2021-May, pp. 173–177, doi: 10.1109/EIT51626.2021.9491913.

[34] G. Fusco, S. A. Cheraghi, L. Neat, and J. M. Coughlan, “An Indoor Navigation App Using Computer Vision and Sign Recognition,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Science and Business Media Deutschland GmbH, vol. 12376 LNCS, pp. 485–494, 2020, doi: 10.1007/978-3-030-58796-3_56.

[35] M. E. Yücel and C. Ünsalan, “Shelf control in retail stores via ultra-low and low power microcontrollers,” J. Real-Time Image Process., vol. 19, no. 4, pp. 751–762, Aug. 2022, doi: 10.1007/s11554-022-01222-2.

[36] M. Shoman, A. Aboah, A. Morehead, Y. Duan, A. Daud, and Y. Adu-Gyamfi, “A Region-Based Deep Learning Approach to Automated Retail Checkout,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Jun. 2022, vol. 2022-June, no. 1. M. Shoman, A. Aboah, A. Morehead, Y. Duan, A. Daud, and Y. Adu-Gyamfi, IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. Work. 2022-June, 3209 (2022), pp. 3209–3214, doi: 10.1109/CVPRW56347.2022.00362.

[37] R. Y. Lee, S. Y. Chua, Y. L. Lai, T. Y. Chai, S. Y. Wai, and S. C. Haw, “Cashierless Checkout Vision System for Smart Retail using Deep Learning,” J. Syst. Manag. Sci., vol. 12, no. 4, pp. 232–250, Aug. 2022, doi: 10.33168/JSMS.2022.0415.

[38] A. Abed, B. Akrout, and I. Amous, “A Novel Deep Convolutional Neural Network Architecture for Customer Counting in the Retail Environment,” in Communications in Computer and Information Science, vol. 1589 CCIS, Springer Science and Business Media Deutschland GmbH, 2022, pp. 327–340, doi: 10.1007/978-3-031-08277-1_27.

[39] A. R. Pathak, M. Pandey, and S. Rautaray, “Application of Deep Learning for Object Detection,” Procedia Comput. Sci., vol. 132, pp. 1706–1717, Jan. 2018, doi: 10.1016/j.procs.2018.05.144.

[40] V. K. Sharma and R. N. Mir, “Saliency guided faster-RCNN (SGFr-RCNN) model for object detection and recognition,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 5, pp. 1687–1699, May 2022, doi: 10.1016/j.jksuci.2019.09.012.

[41] A. Gupta, A. Anpalagan, L. Guan, and A. S. Khwaja, “Deep learning for object detection and scene perception in self-driving cars: Survey, challenges, and open issues,” Array, vol. 10, p. 100057, Jul. 2021, doi: 10.1016/j.array.2021.100057.

[42] Z. Zhou, L. Li, A. Fürsterling, H. J. Durocher, J. Mouridsen, and X. Zhang, “Learning-based object detection and localization for a mobile robot manipulator in SME production,” Robot. Comput. Integr. Manuf., vol. 73, p. 102229, Feb. 2022, doi: 10.1016/j.rcim.2021.102229.

[43] K. Ohri and M. Kumar, “Review on self-supervised image recognition using deep neural networks,” Knowledge-Based Syst., vol. 224, p. 107090, Jul. 2021, doi: 10.1016/j.knosys.2021.107090.

[44] H. Lu, C. Li, W. Chen, and Z. Jiang, “A single shot multibox detector based on welding operation method for biometrics recognition in smart cities,” Pattern Recognit. Lett., vol. 140, pp. 295–302, Dec. 2020, doi: 10.1016/j.patrec.2020.10.016.

[45] G. Ma, M. Wu, Z. Wu, and W. Yang, “Single-shot multibox detector- and building information modeling-based quality inspection model for construction projects,” J. Build. Eng., vol. 38, p. 102216, Jun. 2021, doi: 10.1016/j.jobe.2021.102216.

[46] GitHub, “dog-qiuqiu/Yolo-FastestV2,” 2022. [Online]. Available at: https://github.com/dog-qiuqiu/Yolo-FastestV2.

[47] H. Zhang et al., “An Improved Lightweight Yolo-Fastest V2 for Engineering Vehicle Recognition Fusing Location Enhancement and Adaptive Label Assignment,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 16, pp. 2450–2461, 2023, doi: 10.1109/JSTARS.2023.3249216.

[48] GitHub, “v7.0 - YOLOv5 SOTA Realtime Instance Segmentation,” Glenn Jocher, 2022. [Online]. Available at: https://github.com/ultralytics/yolov5/releases.

[49] M. Wang et al., “FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection,” J. Vis. Commun. Image Represent., vol. 90, p. 103752, Feb. 2023, doi: 10.1016/j.jvcir.2023.103752.

[50] T. Ming and Y. Ju, “SAR ship detection based on YOLOv5,” in Third International Conference on Computer Vision and Data Mining (ICCVDM 2022), Feb. 2023, vol. 12511, p. 82, doi: 10.1117/12.2660100.

[51] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “YOLOX: Exceeding YOLO Series in 2021,” arXiv, vol. 5, p. 12, Jul. 2021, Accessed: Mar. 03, 2024. [Online]. Available at: https://arxiv.org/abs/2107.08430.

[52] K. A. Winanta, T. Kirana, R. D. Hefni Al-Fahsi, A. Patar Jiwandono Pardosi, O. F. Suryani, and I. Ardiyanto, “Moving Objects Counting Dashboard Web Application Design,” in 2019 International Electronics Symposium (IES), Sep. 2019, pp. 45–48, doi: 10.1109/ELECSYM.2019.8901580.

[53] A. Farouk Khalifa, E. Badr, and H. N. Elmahdy, “A survey on human detection surveillance systems for Raspberry Pi,” Image Vis. Comput., vol. 85, pp. 1–13, May 2019, doi: 10.1016/j.imavis.2019.02.010.

[54] S.-H. Chiang, T. Wang, and Y.-F. Chen, “Efficient pedestrian detection in top-view fisheye images using compositions of perspective view patches,” Image Vis. Comput., vol. 105, p. 104069, Jan. 2021, doi: 10.1016/j.imavis.2020.104069.

[55] Y. Song et al., “Online Cost Efficient Customer Recognition System for Retail Analytics,” in 2017 IEEE Winter Applications of Computer Vision Workshops (WACVW), Apr. 2017, pp. 9–16, doi: 10.1109/WACVW.2017.9.

[56] D. A. Mora Hernandez, O. Nalbach, and D. Werth, “How Computer Vision Provides Physical Retail with a Better View on Customers,” in 2019 IEEE 21st Conference on Business Informatics (CBI), Jul. 2019, vol. 1, pp. 462–471, doi: 10.1109/CBI.2019.00060.

[57] I. Haritaoglu and M. Flickner, “Detection and tracking of shopping groups in stores,” in Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001, vol. 1, pp. I-431-I–438, doi: 10.1109/CVPR.2001.990507.

[58] Y. Wei, S. Tran, S. Xu, B. Kang, and M. Springer, “Deep Learning for Retail Product Recognition: Challenges and Techniques,” Comput. Intell. Neurosci., vol. 2020, pp. 1–23, Nov. 2020, doi: 10.1155/2020/8875910.

[59] M. Paolanti, L. Romeo, M. Martini, A. Mancini, E. Frontoni, and P. Zingaretti, “Robotic retail surveying by deep learning visual and textual data,” Rob. Auton. Syst., vol. 118, pp. 179–188, Aug. 2019, doi: 10.1016/j.robot.2019.01.021.

[60] B. Santra, A. K. Shaw, and D. P. Mukherjee, “Part-based annotation-free fine-grained classification of images of retail products,” Pattern Recognit., vol. 121, p. 108257, Jan. 2022, doi: 10.1016/j.patcog.2021.108257.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
   andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0