Self-supervised few-shot learning for real-time traffic sign classification

(1) Anh-Khoa Tho Nguyen Mail (Department of Computer Science, Vietnamese German University, Viet Nam)
(2) Tin Tran Mail (AI Graduate School, Gwangju Institute of Science and Technology, Korea, Democratic People's Republic of)
(3) Phuc Hong Nguyen Mail (Department of Software Engineering, Eastern International University, Viet Nam)
(4) * Vinh Quang Dinh Mail (Department of Computer Science, Vietnamese German University, Viet Nam)
*corresponding author

Abstract


Although supervised approaches for traffic sign classification have demonstrated excellent performance, they are limited to classifying several traffic signs defined in the training dataset. This prevents them from being applied to different domains, i.e., different countries. Herein, we propose a self-supervised approach for few-shot learning-based traffic sign classification. A center-awareness similarity network is designed for the traffic sign problem and trained using an optical flow dataset. Unlike existing supervised traffic sign classification methods, the proposed method does not depend on traffic sign categories defined by the training dataset. It applies to any traffic signs from different countries. We construct a Korean traffic sign classification (KTSC) dataset, including 6000 traffic sign samples and 59 categories. We evaluate the proposed method with baseline methods using the KTSC, German traffic sign, and Belgian traffic sign classification datasets. Experimental results show that the proposed method extends the ability of existing supervised methods and can classify any traffic sign, regardless of region/country dependence. Furthermore, the proposed approach significantly outperforms baseline methods for patch similarity. This approach provides a flexible and robust solution for classifying traffic signs, allowing for accurate categorization of every traffic sign, regardless of regional or national differences.

Keywords


Traffic sign classification; One-shot learning; Few-shot learning; Self-supervised learning; CLIP-based approach

   

DOI

https://doi.org/10.26555/ijain.v10i1.1522
      

Article metrics

Abstract views : 333 | PDF views : 107

   

Cite

   

Full Text

Download

References


[1] N. Gray et al. , “GLARE: A Dataset for Traffic Sign Detection in Sun Glare,” IEEE Trans. Intell. Transp. Syst., vol. 24, no. 11, pp. 12323–12330, Nov. 2023, doi: 10.1109/TITS.2023.3294411.

[2] J. Wang, Y. Chen, Z. Dong, and M. Gao, “Improved YOLOv5 network for real-time multi-scale traffic sign detection,” Neural Comput. Appl., vol. 35, no. 10, pp. 7853–7865, Apr. 2023, doi: 10.1007/s00521-022-08077-5.

[3] Y. Zhu, C. Zhang, D. Zhou, X. Wang, X. Bai, and W. Liu, “Traffic sign detection and recognition using fully convolutional network guided proposals,” Neurocomputing, vol. 214, pp. 758–766, Nov. 2016, doi: 10.1016/j.neucom.2016.07.009.

[4] P. Sermanet and Y. LeCun, “Traffic sign recognition with multi-scale Convolutional Networks,” in The 2011 International Joint Conference on Neural Networks, Jul. 2011, pp. 2809–2813, doi: 10.1109/IJCNN.2011.6033589.

[5] X. Zhang, P. Cui, R. Xu, L. Zhou, Y. He, and Z. Shen, “Deep Stable Learning for Out-Of-Distribution Generalization,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 5368–5378, 2021, doi: 10.1109/CVPR46437.2021.00533.

[6] D. Krueger et al. , “Out-of-Distribution Generalization via Risk Extrapolation,” Proc. Mach. Learn. Res., vol. 139, pp. 5815–5826, 2021, [Online]. Available at: https://arxiv.org/abs/2003.00688.

[7] Y. Shu, Z. Cao, C. Wang, J. Wang, and M. Long, “Open Domain Generalization with Domain-Augmented Meta-Learning,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2021, pp. 9619–9628, doi: 10.1109/CVPR46437.2021.00950.

[8] S. Rakshit, D. Tamboli, P. S. Meshram, B. Banerjee, G. Roig, and S. Chaudhuri, “Multi-source Open-Set Deep Adversarial Domain Adaptation,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12371 LNCS, Springer Science and Business Media Deutschland GmbH, 2020, pp. 735–750, doi: 10.1007/978-3-030-58574-7_44.

[9] M. Mancini, Z. Akata, E. Ricci, and B. Caputo, “Towards Recognizing Unseen Categories in Unseen Domains,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12368 LNCS, Springer Science and Business Media Deutschland GmbH, 2020, pp. 466–483, doi: 10.1007/978-3-030-58592-1_28.

[10] P. Mangla, S. Chandhok, V. N. Balasubramanian, and F. Shahbaz Khan, “COCOA: Context-Conditional Adaptation for Recognizing Unseen Classes in Unseen Domains,” in 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Jan. 2022, pp. 1618–1627, doi: 10.1109/WACV51458.2022.00168.

[11] Z. Lu, Y. Yang, X. Zhu, C. Liu, Y.-Z. Song, and T. Xiang, “Stochastic Classifiers for Unsupervised Domain Adaptation,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2020, pp. 9108–9117, doi: 10.1109/CVPR42600.2020.00913.

[12] J. Liang, D. Hu, and J. Feng, “Domain Adaptation with Auxiliary Target Domain-Oriented Classifier,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2021, pp. 16627–16637, doi: 10.1109/CVPR46437.2021.01636.

[13] Q. Xu, Y. Shi, X. Yuan, and X. X. Zhu, “Universal Domain Adaptation for Remote Sensing Image Scene Classification,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–15, 2023, doi: 10.1109/TGRS.2023.3235988.

[14] Y. Zhu et al. , “Deep Subdomain Adaptation Network for Image Classification,” IEEE Trans. Neural Networks Learn. Syst., vol. 32, no. 4, pp. 1713–1722, Apr. 2021, doi: 10.1109/TNNLS.2020.2988928.

[15] C. Hu, S. Hudson, M. Ethier, M. Al-Sharman, D. Rayside, and W. Melek, “Sim-to-Real Domain Adaptation for Lane Detection and Classification in Autonomous Driving,” in 2022 IEEE Intelligent Vehicles Symposium (IV), Jun. 2022, vol. 2022-June, pp. 457–463, doi: 10.1109/IV51971.2022.9827450.

[16] L. Chen et al. , “Reusing the Task-specific Classifier as a Discriminator: Discriminator-free Adversarial Domain Adaptation,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, vol. 2022-June, pp. 7171–7180, doi: 10.1109/CVPR52688.2022.00704.

[17] S. X. Hu, D. Li, J. Stuhmer, M. Kim, and T. M. Hospedales, “Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, vol. 2022-June, pp. 9058–9067, doi: 10.1109/CVPR52688.2022.00886.

[18] G. Zhang, Z. Luo, K. Cui, S. Lu, and E. P. Xing, “Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 1–12, Nov. 2022, doi: 10.1109/TPAMI.2022.3195735.

[19] Y. Lu, L. Wen, J. Liu, Y. Liu, and X. Tian, “Self-Supervision Can Be a Good Few-Shot Learner,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13679 LNCS, Springer Science and Business Media Deutschland GmbH, 2022, pp. 740–758, doi: 10.1007/978-3-031-19800-7_43.

[20] Y. Lu et al. , “Contour Transformer Network for One-Shot Segmentation of Anatomical Structures,” IEEE Trans. Med. Imaging, vol. 40, no. 10, pp. 2672–2684, Oct. 2021, doi: 10.1109/TMI.2020.3043375.

[21] D. Tomar, B. Bozorgtabar, M. L. Guillaume Vray, M. Saeed Rad, and J.-P. Thiran, “Self-Supervised Generative Style Transfer for One-Shot Medical Image Segmentation,” in 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Jan. 2022, pp. 1737–1747, doi: 10.1109/WACV51458.2022.00180.

[22] H. Yang et al. , “Balanced and Hierarchical Relation Learning for One-shot Object Detection,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, vol. 2022-June, pp. 7581–7590, doi: 10.1109/CVPR52688.2022.00744.

[23] A. Radford et al. , “Learning Transferable Visual Models From Natural Language Supervision,” Proc. Mach. Learn. Res., vol. 139, pp. 8748–8763, 2021, [Online]. Available at: https://arxiv.org/abs/2103.00020.

[24] X. Gu, T. Y. Lin, W. Kuo, and Y. Cui, “Open-Vocabulary Object Detection Via Vision and Language Knowledge Distillation,” ICLR 2022 - 10th Int. Conf. Learn. Represent., pp. 1–21, 2022, [Online]. Available at: https://arxiv.org/abs/2104.13921.

[25] M. Hendriksen, M. Bleeker, S. Vakulenko, N. van Noord, E. Kuiper, and M. de Rijke, “Extending CLIP for Category-to-Image Retrieval in E-Commerce,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13185 LNCS, Springer Science and Business Media Deutschland GmbH, 2022, pp. 289–303, doi: 10.1007/978-3-030-99736-6_20.

[26] D. Jiang and M. Ye, “Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2023, pp. 2787–2797, doi: 10.1109/CVPR52729.2023.00273.

[27] Y. Ge et al. , “Improving Zero-shot Generalization and Robustness of Multi-Modal Models,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2023, pp. 11093–11101, doi: 10.1109/CVPR52729.2023.01067.

[28] A. Sanghi et al. , “CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, vol. 2022-June, pp. 18582–18592, doi: 10.1109/CVPR52688.2022.01805.

[29] E. Ilg, N. Mayer, T. Saikia, M. Keuper, A. Dosovitskiy, and T. Brox, “FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, vol. 2017-Janua, pp. 1647–1655, doi: 10.1109/CVPR.2017.179.

[30] S. Meister, “Outdoor stereo camera system for the generation of real-world benchmark data sets,” Opt. Eng., vol. 51, no. 2, p. 021107, Mar. 2012, doi: 10.1117/1.OE.51.2.021107.

[31] P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolutional neural networks applied to visual document analysis,” in Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings., 2003, vol. 1, pp. 958–963, doi: 10.1109/ICDAR.2003.1227801.

[32] J. Simonsen and O. S. Jensen, “Contact quality in participation,” in Proceedings of the 14th Participatory Design Conference: Short Papers, Interactive Exhibitions, Workshops - Volume 2, Aug. 2016, vol. 2, pp. 45–48, doi: 10.1145/2948076.2948084.

[33] Xufeng Han, T. Leung, Y. Jia, R. Sukthankar, and A. C. Berg, “MatchNet: Unifying feature and metric learning for patch-based matching,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, vol. 07-12-June, pp. 3279–3286, doi: 10.1109/CVPR.2015.7298948.

[34] M. Menze, C. Heipke, and A. Geiger, “Object Scene Flow,” ISPRS J. Photogramm. Remote Sens., vol. 140, pp. 60–76, Jun. 2018, doi: 10.1016/j.isprsjprs.2017.09.013.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
   andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0