Hand–object interaction recognition based on visual attention using multiscopic cyber-physical-social system

(1) * Adnan Rachmat Anom Besari Mail (Politeknik Elektronika Negeri Surabaya (PENS), Indonesia, and Graduate School of Systems Design,Tokyo Metropolitan University, Japan)
(2) Azhar Aulia Saputra Mail (Graduate School of Systems Design, Tokyo Metropolitan University, Japan)
(3) Wei Hong Chin Mail (Graduate School of Systems Design, Tokyo Metropolitan University, Japan)
(4) Kurnianingsih Kurnianingsih Mail (Department of Electrical Engineering, Politeknik Negeri Semarang, Indonesia)
(5) Naoyuki Kubota Mail (Graduate School of Systems Design, Tokyo Metropolitan University, Japan)
*corresponding author


Computer vision-based cyber-physical-social systems (CPSS) are predicted to be the future of independent hand rehabilitation. However, there is a link between hand function and cognition in the elderly that this technology has not adequately supported. To investigate this issue, this paper proposes a multiscopic CPSS framework by developing hand–object interaction (HOI) based on visual attention. First, we use egocentric vision to extract features from hand posture at the microscopic level. With 94.87% testing accuracy, we use three layers of graph neural network (GNN) based on hand skeletal features to categorize 16 grasp postures. Second, we use a mesoscopic active perception ability to validate the HOI with eye tracking in the task-specific reach-to-grasp cycle. With 90.75% testing accuracy, the distance between the fingertips and the center of an object is used as input to a multi-layer gated recurrent unit based on recurrent neural network architecture. Third, we incorporate visual attention into the cognitive ability for classifying multiple objects at the macroscopic level. In two scenarios with four activities, we use GNN with three convolutional layers to categorize some objects. The outcome demonstrates that the system can successfully separate objects based on related activities. Further research and development are expected to support the CPSS application in independent rehabilitation.


Telemedicine; First-person vision; Hand-eye coordination; Independent rehabilitation; Occupational therapy




Article metrics

Abstract views : 482 | PDF views : 176




Full Text



[1] T. Singh, C. M. Perry, S. L. Fritz, J. Fridriksson, and T. M. Herter, “Eye Movements Interfere With Limb Motor Control in Stroke Survivors,” Neurorehabil. Neural Repair, vol. 32, no. 8, pp. 724–734, Aug. 2018, doi: 10.1177/1545968318790016.

[2] M. Szekeres and K. Valdes, “Virtual health care & telehealth: Current therapy practice patterns,” J. Hand Ther., vol. 35, no. 1, pp. 124–130, Jan. 2022, doi: 10.1016/j.jht.2020.11.004.

[3] P. Wang, L. T. Yang, J. Li, J. Chen, and S. Hu, “Data fusion in cyber-physical-social systems: State-of-the-art and perspectives,” Inf. Fusion, vol. 51, pp. 42–57, Nov. 2019, doi: 10.1016/j.inffus.2018.11.002.

[4] A. Laghari, Z. A. Memon, S. Ullah, and I. Hussain, “Cyber Physical System for Stroke Detection,” IEEE Access, vol. 6, pp. 37444–37453, Jun. 2018, doi: 10.1109/ACCESS.2018.2851540.

[5] A. Rashid and O. Hasan, “Wearable technologies for hand joints monitoring for rehabilitation: A survey,” Microelectronics J., vol. 88, pp. 173–183, Jun. 2019, doi: 10.1016/j.mejo.2018.01.014.

[6] A. A. Saputra, A. R. A. Besari, and N. Kubota, “Human Joint Skeleton Tracking Using Multiple Kinect Azure,” in 2022 International Electronics Symposium (IES), Aug. 2022, pp. 430–435, doi: 10.1109/IES55876.2022.9888532.

[7] M. Dousty and J. Zariffa, “Tenodesis Grasp Detection in Egocentric Video,” IEEE J. Biomed. Heal. Informatics, vol. 25, no. 5, pp. 1463–1470, May 2021, doi: 10.1109/JBHI.2020.3003643.

[8] M. Cai, K. Kitani, and Y. Sato, “Understanding hand-object manipulation by modeling the contextual relationship between actions, grasp types and object attributes, ” pp. 1-14, July. 2018. [Online]. Available at: https://arxiv.org/abs/1807.08254v1.

[9] M.-F. Tsai, R. H. Wang, and J. Zariffa, “Identifying Hand Use and Hand Roles After Stroke Using Egocentric Video,” IEEE J. Transl. Eng. Heal. Med., vol. 9, pp. 1–10, 2021, doi: 10.1109/JTEHM.2021.3072347.

[10] A. R. A. Besari, A. A. Saputra, W. H. Chin, N. Kubota, and Kurnianingsih, “Hand-Object Interaction Detection based on Visual Attention for Independent Rehabilitation Support,” in 2022 International Joint Conference on Neural Networks (IJCNN), Jul. 2022, vol. 2022-July, pp. 1–6, doi: 10.1109/IJCNN55064.2022.9892903.

[11] T. Wang, T. Yang, M. Danelljan, F. S. Khan, X. Zhang, and J. Sun, “Learning Human-Object Interaction Detection Using Interaction Points,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp. 4115–4124, doi: 10.1109/CVPR42600.2020.00417.

[12] D. Qurratu’aini, A. Sophian, W. Sediono, H. Md Yusof, and S. Sudirman, “Visual-Based Fingertip Detection for Hand Rehabilitation,” Indones. J. Electr. Eng. Comput. Sci., vol. 9, no. 2, p. 474, Feb. 2018, doi: 10.11591/ijeecs.v9.i2.pp474-480.

[13] J. Likitlersuang, E. R. Sumitro, T. Cao, R. J. Visée, S. Kalsi-Ryan, and J. Zariffa, “Egocentric video: a new tool for capturing hand use of individuals with spinal cord injury at home,” J. Neuroeng. Rehabil., vol. 16, no. 1, p. 83, Dec. 2019, doi: 10.1186/s12984-019-0557-1.

[14] R. J. Visee, J. Likitlersuang, and J. Zariffa, “An Effective and Efficient Method for Detecting Hands in Egocentric Videos for Rehabilitation Applications,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 28, no. 3, pp. 748–755, Mar. 2020, doi: 10.1109/TNSRE.2020.2968912.

[15] A. Bandini, M. Dousty, S. L. Hitzig, B. C. Craven, S. Kalsi-Ryan, and J. Zariffa, “Measuring Hand Use in the Home after Cervical Spinal Cord Injury Using Egocentric Video,” J. Neurotrauma, vol. 39, no. 23–24, pp. 1697–1707, Dec. 2022, doi: 10.1089/neu.2022.0156.

[16] J. Xu, P. Mohan, F. Chen, and A. Nurnberger, “A Real-time Hand Motion Detection System for Unsupervised Home Training,” in 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Oct. 2020, vol. 2020-Octob, pp. 4224–4229, doi: 10.1109/SMC42975.2020.9283261.

[17] Y. Li, L. Jia, Z. Wang, Y. Qian, and H. Qiao, “Un-supervised and semi-supervised hand segmentation in egocentric images with noisy label learning,” Neurocomputing, vol. 334, pp. 11–24, Mar. 2019, doi: 10.1016/j.neucom.2018.12.010.

[18] Y. Lee, W. Do, H. Yoon, J. Heo, W. Lee, and D. Lee, “Visual-inertial hand motion tracking with robustness against occlusion, interference, and contact,” Sci. Robot., vol. 6, no. 58, Sep. 2021, doi: 10.1126/scirobotics.abe1315.

[19] G. Kapidis, R. Poppe, and R. C. Veltkamp, “Multi-Dataset, Multitask Learning of Egocentric Vision Tasks,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Computer Society, 2021, pp. 1–1, doi: 10.1109/TPAMI.2021.3061479.

[20] K. Hesseberg, G. G. Tangen, A. H. Pripp, and A. Bergland, “Associations between Cognition and Hand Function in Older People Diagnosed with Mild Cognitive Impairment or Dementia,” Dement. Geriatr. Cogn. Dis. Extra, vol. 10, no. 3, pp. 195–204, Dec. 2020, doi: 10.1159/000510382.

[21] J. Jiang, Z. Nan, H. Chen, S. Chen, and N. Zheng, “Predicting short-term next-active-object through visual attention and hand position,” Neurocomputing, vol. 433, pp. 212–222, Apr. 2021, doi: 10.1016/j.neucom.2020.12.069.

[22] R. Tanaka, J. Woo, and N. Kubota, “Nonverbal Communication Based on Instructed Learning for Socially Embedded Robot Partners,” J. Adv. Comput. Intell. Intell. Informatics, vol. 23, no. 3, pp. 584–591, May 2019, doi: 10.20965/jaciii.2019.p0584.

[23] M. Yani, A. R. A. Besari, N. Yamada, and N. Kubota, “Ecological-Inspired System Design for Safety Manipulation Strategy in Home-care Robot,” in 2020 International Symposium on Community-centric Systems (CcS), Sep. 2020, pp. 1–6, doi: 10.1109/CcS49175.2020.9231354.

[24] A. R. A. Besari, A. A. Saputra, W. H. Chin, Kurnianingsih, and N. Kubota, “Finger Joint Angle Estimation With Visual Attention for Rehabilitation Support: A Case Study of the Chopsticks Manipulation Test,” IEEE Access, vol. 10, no. September, pp. 91316–91331, 2022, doi: 10.1109/ACCESS.2022.3201894.

[25] A. A. Saputra, K. Wada, S. Masuda, and N. Kubota, “Multi-scopic neuro-cognitive adaptation for legged locomotion robots,” Sci. Rep., vol. 12, no. 1, p. 16222, Sep. 2022, doi: 10.1038/s41598-022-19599-2.

[26] K. Oshio, K. Kaneko, and N. Kubota, “Multi-scopic Simulation for Human-robot Interactions Based on Multi-objective Behavior Coordination,” in International Workshop on Advanced Computational Intelligence and Intelligent Informatics, 2021, no. Iwaciii, pp. 3–8. [Online]. Available at: https://iwaciii2021.bit.edu.cn/docs/2021-12/b3d6c84e7e244c6e89cf502ed15cdc17.pdf.

[27] P. Pradhyumna, G. P. Shreya, and Mohana, “Graph Neural Network (GNN) in Image and Video Understanding Using Deep Learning for Computer Vision Applications,” in 2021 Second International Conference on Electronics and Sustainable Communication Systems (ICESC), Aug. 2021, pp. 1183–1189, doi: 10.1109/ICESC51422.2021.9532631.

[28] Y. J. R. De Kloe, I. T. C. Hooge, C. Kemner, D. C. Niehorster, M. Nyström, and R. S. Hessels, “Replacing eye trackers in ongoing studies: A comparison of eye‐tracking data quality between the Tobii Pro TX300 and the Tobii Pro Spectrum,” Infancy, vol. 27, no. 1, pp. 25–45, Jan. 2022, doi: 10.1111/infa.12441.

[29] H. Fu, L. Wu, M. Jian, Y. Yang, and X. Wang, “MF-SORT: Simple Online and Realtime Tracking with Motion Features,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11901 LNCS, Springer, 2019, pp. 157–168, doi: 10.1007/978-3-030-34120-6_13.

[30] V. Chunduru, M. Roy, D. R. N. S, and R. G. Chittawadigi, “Hand Tracking in 3D Space using MediaPipe and PnP Method for Intuitive Control of Virtual Globe,” in 2021 IEEE 9th Region 10 Humanitarian Technology Conference (R10-HTC), Sep. 2021, pp. 1–6, doi: 10.1109/R10-HTC53172.2021.9641587.

[31] A. R. Anom Besari, W. H. Chin, N. Kubota, and Kurnianingsih, “Ecological Approach for Object Relationship Extraction in Elderly Care Robot,” in 2020 21st International Conference on Research and Education in Mechatronics (REM), Dec. 2020, pp. 1–6, doi: 10.1109/REM49740.2020.9313877.

[32] R. Volcic and F. Domini, “The endless visuomotor calibration of reach-to-grasp actions,” Sci. Rep., vol. 8, no. 1, p. 14803, Oct. 2018, doi: 10.1038/s41598-018-33009-6.

[33] A. Pareja et al., “EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs,” Proc. AAAI Conf. Artif. Intell., vol. 34, no. 04, pp. 5363–5370, Apr. 2020, doi: 10.1609/aaai.v34i04.5984.

[34] A. R. Anom Besari, A. A. Saputra, W. H. Chin, N. Kubota, and Kurnianingsih, “Feature-based Egocentric Grasp Pose Classification for Expanding Human-Object Interactions,” in 2021 IEEE 30th International Symposium on Industrial Electronics (ISIE), Jun. 2021, vol. 2021-June, pp. 1–6, doi: 10.1109/ISIE45552.2021.9576369.

[35] A. Vysocký et al., “Analysis of Precision and Stability of Hand Tracking with Leap Motion Sensor,” Sensors, vol. 20, no. 15, p. 4088, Jul. 2020, doi: 10.3390/s20154088.

[36] A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh, “The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances,” Data Min. Knowl. Discov., vol. 31, no. 3, pp. 606–660, May 2017, doi: 10.1007/s10618-016-0483-9.

[37] H. Hanafi, N. Suryana, and A. S. H. Basari, “Dynamic convolutional neural network for eliminating item sparse data on recommender system,” Int. J. Adv. Intell. Informatics, vol. 4, no. 3, p. 226, Nov. 2018, doi: 10.26555/ijain.v4i3.291.

[38] R. Tanaka, J. Woo, and N. Kubota, “Action Acquisition Method for Constructing Cognitive Development System Through Instructed Learning,” in 2019 International Joint Conference on Neural Networks (IJCNN), Jul. 2019, vol. 2019-July, pp. 1–6, doi: 10.1109/IJCNN.2019.8852180.

[39] G. H. Martono, A. Azhari, and K. Mustofa, “An extended approach of weight collective influence graph for detection influence actor,” Int. J. Adv. Intell. Informatics, vol. 8, no. 1, p. 1, Mar. 2022, doi: 10.26555/ijain.v8i1.800.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
   andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0