Alignment control using visual servoing and mobilenet single-shot multi-box detection (SSD): a review

(1) * Jayson Rogelio Mail (Department of Science and Technology- MIRDC, Philippines)
(2) Elmer Dadios Mail (De La Salle University, Philippines)
(3) Argel Bandala Mail (De La Salle University, Philippines)
(4) Ryan Rhay Vicerra Mail (De La Salle University, Philippines)
(5) Edwin Sybingco Mail (De La Salle University, Philippines)
*corresponding author


The concept is highly critical for robotic technologies that rely on visual feedback. In this context, robot systems tend to be unresponsive due to reliance on pre-programmed trajectory and path, meaning the occurrence of a change in the environment or the absence of an object. This review paper aims to provide comprehensive studies on the recent application of visual servoing and DNN. PBVS and Mobilenet-SSD were chosen algorithms for alignment control of the film handler mechanism of the portable x-ray system. It also discussed the theoretical framework features extraction and description, visual servoing, and Mobilenet-SSD. Likewise, the latest applications of visual servoing and DNN was summarized, including the comparison of Mobilenet-SSD with other sophisticated models. As a result of a previous study presented, visual servoing and MobileNet-SSD provide reliable tools and models for manipulating robotics systems, including where occlusion is present. Furthermore, effective alignment control relies significantly on visual servoing and deep neural reliability, shaped by different parameters such as the type of visual servoing, feature extraction and description, and DNNs used to construct a robust state estimator. Therefore, visual servoing and MobileNet-SSD are parameterized concepts that require enhanced optimization to achieve a specific purpose with distinct tools.



Article metrics

Abstract views : 850 | PDF views : 226




Full Text



[1] P. Durdevic, "A Deep Neural Network Sensor for Visual Servoing in 3D Spaces," 2020, doi: 10.3390/s20051437.

[2] J. A. C. Jose et al., "Categorizing License Plates Using Convolutional Neural Network with Residual Learning," in 2019 4th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS), 2019, pp. 231–234, doi: 10.1109/ACIRS.2019.8935997.

[3] X. Gratal, J. Romero, J. Bohg, and D. Kragic, "Visual servoing on unknown objects," Mechatronics, vol. 22, no. 4, pp. 423–435, Jun. 2012, doi: 10.1016/j.mechatronics.2011.09.009.

[4] R. R. P. Vicerra et al., "A multiple level MIMO fuzzy logic based intelligence for multiple agent cooperative robot system," in TENCON 2015 - 2015 IEEE Region 10 Conference, 2015, pp. 1–7, doi: 10.1109/TENCON.2015.7372985.

[5] Y. J. Lee and A. Yilmaz, "Real-time object detection, tracking, and 3D positioning in a multiple camera setup," ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci., vol. II-3/W2, no. November, pp. 31–35, Oct. 2013, doi: 10.5194/isprsannals-II-3-W2-31-2013.

[6] Z. Zhao, P. Zheng, S. Xu, and X. Wu, "Object Detection With Deep Learning: A Review," IEEE Trans. Neural Networks Learn. Syst., vol. 30, no. 11, pp. 3212–3232, Nov. 2019, doi: 10.1109/TNNLS.2018.2876865.

[7] C. L. C. Bual, R. D. Cunanan, R. A. R. Bedruz, A. A. Bandala, R. R. P. Vicerra, and E. P. Dadios, "Design of Controller and PWM-enabled DC Motor Simulation using Proteus 8 for Flipper Track Robot," in 2019 IEEE 11th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management ( HNICEM ), 2019, no. 1, pp. 1–5, doi: 10.1109/HNICEM48295.2019.9072736.

[8] J. P. Rogelio et al., "Modal Analysis, Computational Fluid Dynamics and Harmonic Response Analysis of a 3D Printed X-ray Film Handler for Assistant Robotic System using Finite Element Method," in 2020 IEEE 12th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), 2020, pp. 1–6, doi: 10.1109/HNICEM51456.2020.9400014.

[9] R. L. Galvez, A. A. Bandala, E. P. Dadios, R. R. P. Vicerra, and J. M. Z. Maningo, "Object Detection Using Convolutional Neural Networks," in TENCON 2018 - 2018 IEEE Region 10 Conference, 2018, pp. 2023–2027, doi: 10.1109/TENCON.2018.8650517.

[10] E. Salahat and M. Qasaimeh, "Recent advances in features extraction and description algorithms: A comprehensive survey," in 2017 IEEE International Conference on Industrial Technology (ICIT), 2017, pp. 1059–1063, doi: 10.1109/ICIT.2017.7915508.

[11] R. L. Galvez, E. P. Dadios, A. A. Bandala, and R. R. P. Vicerra, "YOLO-based Threat Object Detection in X-ray Images," in 2019 IEEE 11th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management ( HNICEM ), 2019, pp. 1–5, doi: 10.1109/HNICEM48295.2019.9073599.

[12] A. A. Bandala et al., "Development of Leap Motion Capture Based - Hand Gesture Controlled Interactive Quadrotor Drone Game," in 2019 7th International Conference on Robot Intelligence Technology and Applications (RiTA), 2019, pp. 174–179, doi: 10.1109/RITAPP.2019.8932800.

[13] J. R. Sanchez-Lopez, A. Marin-Hernandez, E. R. Palacios-Hernandez, H. V. Rios-Figueroa, and L. F. Marin-Urias, "A Real-time 3D Pose Based Visual Servoing Implementation for an Autonomous Mobile Robot Manipulator," Procedia Technol., vol. 7, pp. 416–423, 2013, doi: 10.1016/j.protcy.2013.04.052.

[14] Yali Li, Shengjin Wang, Qi Tian, and Xiaoqing Ding, "Learning Cascaded Shared-Boost Classifiers for Part-Based Object Detection," IEEE Trans. Image Process., vol. 23, no. 4, pp. 1858–1871, Apr. 2014, doi: 10.1109/TIP.2014.2307432.

[15] M. Yazdi and T. Bouwmans, "New trends on moving object detection in video images captured by a moving camera: A survey," Comput. Sci. Rev., vol. 28, pp. 157–177, May 2018, doi: 10.1016/j.cosrev.2018.03.001.

[16] K. Tong, Y. Wu, and F. Zhou, "Recent advances in small object detection based on deep learning: A review," Image Vis. Comput., vol. 97, p. 103910, May 2020, doi: 10.1016/j.imavis.2020.103910.

[17] and A. S. S. S. A. O. M. F. Demirci, "Deep Learning-Based Object Classification and Position Estimation Pipeline for Potential Use in Robotized Pick-and-Place Operations," Robotics, vol. 9, no. 3, p. 63, Aug. 2020, doi: 10.3390/robotics9030063.

[18] T. W. Teng, P. Veerajagadheswar, B. Ramalingam, J. Yin, R. Elara Mohan, and B. F. Gómez, "Vision Based Wall Following Framework: A Case Study With HSR Robot for Cleaning Application," Sensors, vol. 20, no. 11, p. 3298, Jun. 2020, doi: 10.3390/s20113298.

[19] D. Kuhn, J. L. Buessler, and J. P. Urban, "Neural approach to visual servoing for robotic hand eye coordination," in Proceedings of ICNN'95 - International Conference on Neural Networks, 1995, vol. 5, pp. 2364–2369, doi: 10.1109/ICNN.1995.487731.

[20] Z. Xungao, X. Min, G. Jiansheng, Z. Xunyu, and P. Xiafu, "Robot manipulation using image-based visual servoing control with robust state estimation," in 2018 Chinese Control And Decision Conference (CCDC), 2018, pp. 445–449, doi: 10.1109/CCDC.2018.8407174.

[21] D. E. Touil, N. Terki, A. Aouina, and R. Ajgou, "Intelligent Image-based-Visual Servoing for Quadrotor Air Vehicle," in 2018 International Conference on Communications and Electrical Engineering (ICCEE), 2018, pp. 1–7, doi: 10.1109/CCEE.2018.8634553.

[22] R. Mahony, P. Corke, and F. Chaumette, "Choice of image features for depth-axis control in image based visual servo control," in IEEE/RSJ International Conference on Intelligent Robots and System, 2009, vol. 1, pp. 390–395, doi: 10.1109/IRDS.2002.1041420.

[23] W. Lin, A. Anwar, Z. Li, M. Tong, J. Qiu, and H. Gao, "Recognition and Pose Estimation of Auto Parts for an Autonomous Spray Painting Robot," IEEE Trans. Ind. Informatics, vol. 15, no. 3, pp. 1709–1719, Mar. 2019, doi: 10.1109/TII.2018.2882446.

[24] L. Shi, "An Object Detection and Pose Estimation Approach for Position Based Visual Servoing," Electr. Control Commun. Eng., vol. 12, no. 1, pp. 34–39, Jul. 2017, doi: 10.1515/ecce-2017-0005.

[25] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size," pp. 1–13, 2017, doi: 10.48550/arXiv.1602.07360.

[26] H. Rezatofighi, N. Tsoi, J. Gwak, I. Reid, and S. Savarese, "Generalized Intersection over Union : A Metric and A Loss for Bounding Box Regression," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 658–666. doi: 10.1109/CVPR.2019.00075

[27] W. Pan, M. Lyu, K. Hwang, M. Ju, and H. Shi, "A Neuro-Fuzzy Visual Servoing Controller for an Articulated Manipulator," IEEE Access, vol. 6, pp. 3346–3357, 2018, doi: 10.1109/ACCESS.2017.2787738.

[28] F. Wang, F. Sun, J. Zhang, B. Lin, and X. Li, "Unscented Particle Filter for Online Total Image Jacobian Matrix Estimation in Robot Visual Servoing," IEEE Access, vol. 7, pp. 92020–92029, 2019, doi: 10.1109/ACCESS.2019.2927413.

[29] H. Xie, A. F. Lynch, K. H. Low, and S. Mao, "Adaptive Output-Feedback Image-Based Visual Servoing for Quadrotor Unmanned Aerial Vehicles," IEEE Trans. Control Syst. Technol., vol. 28, no. 3, pp. 1034–1041, May 2020, doi: 10.1109/TCST.2019.2892034.

[30] B. Debnath, M. O'Brien, M. Yamaguchi, and A. Behera, "Adapting MobileNets for mobile based upper body pose estimation," in 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2018, no. November 2019, pp. 1–6, doi: 10.1109/AVSS.2018.8639378.

[31] S. Khalid, T. Khalil, and S. Nasreen, "A survey of feature selection and feature extraction techniques in machine learning," in 2014 Science and Information Conference, 2014, no. July, pp. 372–378, doi: 10.1109/SAI.2014.6918213.

[32] D. Sinha and M. El-Sharkawy, "Ultra-thin MobileNet," in 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), 2020, pp. 0234–0240, doi: 10.1109/CCWC47524.2020.9031228.

[33] M. Hassaballah, A. A. Abdelmgeid, and H. A. Alshazly, Image Feature Detectors and Descriptors, vol. 630. Cham: Springer International Publishing, 2016. Available at: Google Scholar.

[34] J. Liu and Y. Li, "Visual Servoing with Deep Learning and Data Augmentation for Robotic Manipulation," J. Adv. Comput. Intell. Intell. Informatics, vol. 24, no. 7, pp. 953–962, Dec. 2020, doi: 10.20965/jaciii.2020.p0953.

[35] M. Sandler, M. Zhu, A. Zhmoginov, and C. V Mar, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4510–4520. doi: 10.1109/CVPR.2018.00474

[36] H. Chen and C.-Y. Su, "An Enhanced Hybrid MobileNet," in 2018 9th International Conference on Awareness Science and Technology (iCAST), 2018, no. September 2018, pp. 308–312, doi: 10.1109/ICAwST.2018.8517177.

[37] J. Guo, "Network Decoupling: From Regular to Depthwise Separable Convolutions," in arXiv preprint arXiv:1808.05517, 2018, pp. 1–12, doi: 10.48550/arXiv.1808.05517.

[38] J. S. Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6848–6856. Available at: Google Scholar.

[39] B. Graham, "Fractional max-pooling," in arXiv preprint arXiv:1412.6071, 2014, pp. 1–10. Available at: Google Scholar.

[40] A. G. Howard et al., "Mobilenets: Efficient convolutional neural networks for mobile vision applications," in arXiv preprint arXiv:1704.04861, 2017, doi: 10.48550/arXiv.1704.04861.

[41] A. Younis, L. Shixin, S. Jn, and Z. Hai, "Real-Time Object Detection Using Pre-Trained Deep Learning Models MobileNet-SSD," in Proceedings of 2020 the 6th International Conference on Computing and Data Engineering, 2020, pp. 44–48, doi: 10.1145/3379247.3379264.

[42] M. Papoutsidakis, K. Kalovrektis, C. Drosos, and G. Stamoulis, "Design of an Autonomous Robotic Vehicle for Area Mapping and Remote Monitoring," Int. J. Comput. Appl., vol. 167, no. 12, pp. 36–41, Jun. 2017, doi: 10.5120/ijca2017914496.

[43] W. Zhang and G. Zhang, "Image Feature Matching Based on Semantic Fusion Description and Spatial Consistency," Symmetry (Basel)., vol. 10, no. 12, p. 725, Dec. 2018, doi: 10.3390/sym10120725.

[44] W. Rahmaniar and A. Hernawan, "Real-Time Human Detection Using Deep Learning on Embedded Platforms: A Review," J. Robot. Control, vol. 2, no. 6, pp. 462–468, 2021, doi: 10.18196/jrc.26123.

[45] D. Kragic and H. I. Christensen, "Survey on Visual Servoing for Manipulation," Comput. Vis. Act. Percept. Lab. Fiskartorpsv, vol. 15, pp. 1–58, 2002. Available at: Google Scholar.

[46] Y. Wang, D. Ewert, R. Vossen, and S. Jeschke, “A Visual Servoing System for Interactive Human-Robot Object Transfer,” J. Autom. Control Eng., vol. 3, no. 4, pp. 277–283, 2015, doi: 10.12720/joace.3.4.277-283.

[47] E. G. Ribeiro, R. de Queiroz Mendes, and V. Grassi, "Real-time deep learning approach to visual servo control and grasp detection for autonomous robotic manipulation," Rob. Auton. Syst., vol. 139, p. 103757, May 2021, doi: 10.1016/j.robot.2021.103757.

[48] D. Cabecinhas, S. Bras, R. Cunha, C. Silvestre, and P. Oliveira, “Integrated Visual Servoing Solution to Quadrotor Stabilization and Attitude Estimation Using a Pan and Tilt Camera,” IEEE Trans. Control Syst. Technol., vol. 27, no. 1, pp. 14–29, Jan. 2019, doi: 10.1109/TCST.2017.2768515.

[49] J. Liu and Y. Li, "An Image Based Visual Servo Approach with Deep Learning for Robotic Manipulation," in The 6th International Workshop on Advanced Computational Intelligence and Intelligent Informatics, 2019, pp. 1–6, doi: 10.48550/arXiv.1909.07727.

[50] Q. Bateux et al., "Training Deep Neural Networks for Visual Servoing To cite this version : HAL Id : hal-01716679 Training Deep Neural Networks for Visual Servoing," 2018. doi: 10.1109/ICRA.2018.8461068

[51] H. Wang, S. Member, B. Yang, J. Wang, and X. Liang, "Adaptive Visual Servoing of Contour Features," vol. 23, no. 2, pp. 811–822, 2018. doi: 10.1109/TMECH.2018.2794377

[52] V. Nicolau, M. Andrei, and G. Petrea, "Aspects of Image Compression using Neural Networks for Visual Servoing in Robot Control," pp. 2–6, 2017. doi: 10.1109/ISEEE.2017.8170627

[53] Y. Zhang, S. Li, B. Liao, L. Jin, and L. Zheng, "A Recurrent Neural Network Approach for Visual Servoing," no. July, pp. 614–619, 2017. doi: 10.1109/ICInfA.2017.8078981

[54] A. G. Howard, B. Chen, and W. Wang, "MobileNets : Efficient Convolutional Neural Networks for Mobile Vision MobileNets : Efficient Convolutional Neural Networks for Mobile Vision Applications," no. October, 2020. Available at: Google Scholar.

[55] Y. Cao and S. Liu, "Visual Servo Control for Wheeled Robot Platooning Based on Homography," pp. 628–632, 2017. doi: 10.1109/DDCLS.2017.8068145

[56] M. F. Yahya, "Image-Based Visual Servoing for Docking of an Autonomous Underwater Vehicle," pp. 1–6, 2017. doi: 10.1109/USYS.2017.8309453

[57] J. Demby, Y. Gao, A. Shafiekhani, and G. N. Desouza, "Object Detection and Pose Estimation Using CNN in Embedded Hardware for Assistive Technology," no. November, 2019. doi: 10.1109/SSCI44817.2019.9002767

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
E: (paper handling issues) (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0