Human action recognition using support vector machines and 3D convolutional neural networks

(1) * Majd Latah Mail (Ege University - Izmir - Turkey.)
*corresponding author


Abstract


Recently, deep learning approach has been used widely in order to enhance the recognition accuracy with different application areas. In this paper, both of deep convolutional neural networks (CNN) and support vector machines approach were employed in human action recognition task. Firstly, 3D CNN approach was used to extract spatial and temporal features from adjacent video frames. Then, support vector machines approach was used in order to classify each instance based on previously extracted features. Both of the number of CNN layers and the resolution of the input frames were reduced to meet the limited memory constraints. The proposed architecture was trained and evaluated on KTH action recognition dataset and achieved a good performance.

Keywords


3D Convolutional Neural Network (CNN); Human Action Recognition; Support Vector Machines (SVM)

   

DOI

https://doi.org/10.26555/ijain.v3i1.89
   

Article metrics

Abstract views : 303 | PDF views : 70

   

Cite

   

Full Text

Download

References


R. Poppe, ‘‘A survey on vision-based human action recognition’’, Image and Vision Computing, vol. 28, no. 6, pp. 976-990, Jun. 2010.

T. B. Moeslund, A. Hilton, and V. Kruger, ‘‘A survey of advances in vision-based human motion capture and analysis’’, Computer Vision and Image Understanding, vol. 104, no. (2-3), pp. 90-126, Nov-Dec. 2006.

D. M. Gavrila, ‘‘The visual analysis of human movement: a survey’’, Computer Vision and Image Understanding, vol. 73, no. 1, pp. 82-98, Jan. 1999.

R. Poppe, ‘‘Vision-based human motion analysis: an overview’’, Computer Vision and Image Understanding, vol. 108 no. 1-2, pp. 4-18, Oct-Nov. 2007.

C. Cedras, and M. Shah, ‘‘Motion-based recognition: a survey’’, Image and Vision Computing, vol. 13, no. 2, pp. 129-155, Mar. 1995.

P. Scovanner, S. Ali, and M. Shah, ‘‘A 3-dimensional sift descriptor and its application to action recognition’’, in Proc. of the 15th ACM International Conference on Multimedia, 2007, pp. 357-360.

G. Willems, T. Tuytelaars, and L. Gool, ‘‘An efficient dense and scale-invariant spatio-temporal interest point detector’’, in Proc. of the 10th European Conference on Computer Vision, 2008, pp. 650-663.

A. Klaser, M. Marszalek, and C. Schmid, ‘‘A spatio-temporal descriptor based on 3D gradients’’, in Proc. of the British Machine Vision Conference BMVC'08, 2008, pp. 1-10.

I. Laptev, and T. Lindeberg, ‘‘Space-time interest points’’, in Proc. of the Ninth IEEE International Conference on Computer Vision ICCV'03, 2003, pp. 432-439.

H. Wang, A. Klaser, C. Schmid, and C.L. Liu, ‘‘Dense trajectories and motion boundary descriptors for action recognition’’, International Journal of Computer Vision. 103, pp. 60-79, May. 2013.

C. Schuldt, I. Laptev, and B. Caputo, ‘‘Recognizing human actions: A local SVM approach’’, in Proc. of the 17th International Conference on Pattern Recognition, 2004, pp. 32–36,

Y. Hu, L. Cao, F. Lv, S. Yan, Y. Gong, and T. Huang, ‘‘Action detection in complex scenes with spatial and temporal ambiguities’’, in Proc. of the 12th IEEE International Conference on Computer Vision (ICCV), 2009, pp. 128-135.

H. Qian, Y. Mao, W. Xiang, and Z. Wang, ‘‘Recognition of human activities using SVM multi-class classifier’’, Pattern Recognition Letters, vol. 31, no. 2, pp.100-111, Jan. 2010.

G, Johansson, Visual motion perception. Scientific American, 232, pp. 76-88, 1975.

H. Wang, A. Klaser, C. Schmid, and L. Cheng-Lin, ‘‘Action recognition by dense trajectories’’, in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 3169-3176.

I. Laptev, and T. Lindeberg, ‘‘Space-time interest points’’, in Proc. of the IEEE International Conference on Computer Vision (ICCV), 2003, pp. 432-439.

P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, ‘‘Behavior recognition via sparse spatio-temporal features’’, in Proc. of the IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005.

T. H. Thi, J. Zhang, L. Cheng, L. Wang, and S. Satoh, ‘‘Human action recognition and localization in video using structured learning of local space-time features’’, in Proc. of the IEEE International Conference on Advanced Video and Signal Based Surveillance, 2010, pp. 204-211.

N. Ikizler-Cinbis, and S. Sclaroff, ‘‘Object, scene and actions: combining multiple features for human action recognition’’, in Proc. of the European Conference on Computer vision (ECCV), 2010, pp. 494-507.

M. B. Holte, T.B. Moeslund, N. Nikolaidis, and I. Pitas, ‘‘3d human action recognition for multi-view camera systems’’, in Proc. of the International Conference on 3D Imaging, Modeling, Processing and Transmission, 2011, pp. 342-349.

Z. Lin, Z. Jiang, and L. S. Davis, ‘‘Recognizing actions by shapemotion prototype trees’’, in Proc. of the IEEE 12th International Conference on Computer Vision, 2009, pp. 444-451.

J. Yamato, J. Ohya, and K. Ishii, ‘‘Recognizing human action in time-sequential images using hidden Markov model’’, in Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR '92, 1992, pp. 379-385.

T. Starner, J. Weaver, and A. Pentland, ‘‘Real-time American sign language recognition using desk and wearable computer based video’’, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 12, pp. 1371-1375, Dec. 1998.

A.F. Bobick, and A.D. Wilson, ‘‘A state-based approach to the representation and recognition of gestur’’, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 12, pp. 1325-1337, Dec. 1997.

E. Yu, and J. K. Aggarwal, ‘‘Human Action Recognition with Extremities as Semantic Posture Representation’’, IEEE CVPR Workshop on Semantic Learning and Applications in Multimedia, 2009, pp. 1-8.

Q. Shi, L. Cheng, L. Wang, and A. Smola, ‘‘Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models’’, International Journal of Computer Vision, vol. 93, no.1, pp. 22-32, May. 2010.

Y. Shi, Y. Huang, D. Minnen, A. Bobick, and I. Essa, ‘‘Propagation networks for recognition of partially ordered sequential action’’, in Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004, pp. 862-869.

J. Yin, and Y. Meng, ‘‘Human activity recognition in video using a hierarchical probabilistic latent model’’, in Proc. of IEEE Conference on Computer Vision and Pattern Recognition - Workshops, 2010, pp. 15-20.

L. Wang, Y. Wang, and W. Gao, ‘‘Mining layered grammar rules for action recognition,’’ International Journal of Computer Vision, vol. 93, no. 2, pp. 162-182, Jun. 2010.

Moujahid, A. (2016). Retrieved online on November 15, 2016, from http://adilmoujahid.com/posts/2016/06/introduction-deep-learning-python-caffe/

A. Rikhtegar, M. Pooyan and M.T. Manzuri-Shalmani, ‘‘Genetic algorithm-optimised structure of convolutional neural network for face recognition applications,’’ IET Computer Vision, vol. 10, no. 6, pp. 559-566, Sep. 2016.

C. Cortes, and V. Vapnik, ‘‘Support-vector networks,’’ Machine Learning, vol. 20, no. 3, pp. 273–297, Sep. 1995.

B.E. Boser, I. Guyon, and V. Vapnik, ‘‘A training algorithm for optimal margin classifiers,’’ in Proc. of the Fifth Annual Workshop on Computational Learning Theory, 1992, pp. 144 -152.

S. T. Ikram, and A. K. Cherukuri, ‘‘Improving accuracy of intrusion detection model using PCA and optimized SVM,’’ Journal of Computing and Information Technology, vol. 24, no. 2, pp. 133–148, Jun. 2016.

Y. LeCun, K. Kavukcuoglu, and C. Farabet, Convolutional Networks and Applications in Vision, in Proc. of IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems (ISCAS), 2010, pp. 253-256.

S. Ji, W. Xu, M. Yang, and K. Yu, 3D Convolutional Neural Networks for Human Action Recognition, IEEE Transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221-231, Jan. 2013.




Copyright (c) 2017 International Journal of Advances in Intelligent Informatics

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by Informatics Department - Universitas Ahmad Dahlan , and UTM Big Data Centre - Universiti Teknologi Malaysia
Published by Universitas Ahmad Dahlan
W : http://ijain.org
E : info@ijain.org, andri.pranolo@tif.uad.ac.id (paper handling issues)
     ijain@uad.ac.id, andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0