Human action recognition using support vector machines and 3D convolutional neural networks

Majd Latah

doi:10.26555/ijain.v3i1.89


Human action recognition using support vector machines and 3D convolutional neural networks

^{(1) *} Majd Latah

(Ege University - Izmir - Turkey.)
^*corresponding author

Abstract

Recently, deep learning approach has been used widely in order to enhance the recognition accuracy with different application areas. In this paper, both of deep convolutional neural networks (CNN) and support vector machines approach were employed in human action recognition task. Firstly, 3D CNN approach was used to extract spatial and temporal features from adjacent video frames. Then, support vector machines approach was used in order to classify each instance based on previously extracted features. Both of the number of CNN layers and the resolution of the input frames were reduced to meet the limited memory constraints. The proposed architecture was trained and evaluated on KTH action recognition dataset and achieved a good performance.

Keywords

3D Convolutional Neural Network (CNN); Human Action Recognition; Support Vector Machines (SVM)

DOI

https://doi.org/10.26555/ijain.v3i1.89

Article metrics

Abstract views : 3483 | PDF views : 448

Cite

How to cite item

Full Text

Download

References

R. Poppe, â€˜â€˜A survey on vision-based human action recognitionâ€™â€™, Image and Vision Computing, vol. 28, no. 6, pp. 976-990, Jun. 2010.

T. B. Moeslund, A. Hilton, and V. Kruger, â€˜â€˜A survey of advances in vision-based human motion capture and analysisâ€™â€™, Computer Vision and Image Understanding, vol. 104, no. (2-3), pp. 90-126, Nov-Dec. 2006.

D. M. Gavrila, â€˜â€˜The visual analysis of human movement: a surveyâ€™â€™, Computer Vision and Image Understanding, vol. 73, no. 1, pp. 82-98, Jan. 1999.

R. Poppe, â€˜â€˜Vision-based human motion analysis: an overviewâ€™â€™, Computer Vision and Image Understanding, vol. 108 no. 1-2, pp. 4-18, Oct-Nov. 2007.

C. Cedras, and M. Shah, â€˜â€˜Motion-based recognition: a surveyâ€™â€™, Image and Vision Computing, vol. 13, no. 2, pp. 129-155, Mar. 1995.

P. Scovanner, S. Ali, and M. Shah, â€˜â€˜A 3-dimensional sift descriptor and its application to action recognitionâ€™â€™, in Proc. of the 15th ACM International Conference on Multimedia, 2007, pp. 357-360.

G. Willems, T. Tuytelaars, and L. Gool, â€˜â€˜An efficient dense and scale-invariant spatio-temporal interest point detectorâ€™â€™, in Proc. of the 10th European Conference on Computer Vision, 2008, pp. 650-663.

A. Klaser, M. Marszalek, and C. Schmid, â€˜â€˜A spatio-temporal descriptor based on 3D gradientsâ€™â€™, in Proc. of the British Machine Vision Conference BMVC'08, 2008, pp. 1-10.

I. Laptev, and T. Lindeberg, â€˜â€˜Space-time interest pointsâ€™â€™, in Proc. of the Ninth IEEE International Conference on Computer Vision ICCV'03, 2003, pp. 432-439.

H. Wang, A. Klaser, C. Schmid, and C.L. Liu, â€˜â€˜Dense trajectories and motion boundary descriptors for action recognitionâ€™â€™, International Journal of Computer Vision. 103, pp. 60-79, May. 2013.

C. Schuldt, I. Laptev, and B. Caputo, â€˜â€˜Recognizing human actions: A local SVM approachâ€™â€™, in Proc. of the 17th International Conference on Pattern Recognition, 2004, pp. 32â€“36,

Y. Hu, L. Cao, F. Lv, S. Yan, Y. Gong, and T. Huang, â€˜â€˜Action detection in complex scenes with spatial and temporal ambiguitiesâ€™â€™, in Proc. of the 12th IEEE International Conference on Computer Vision (ICCV), 2009, pp. 128-135.

H. Qian, Y. Mao, W. Xiang, and Z. Wang, â€˜â€˜Recognition of human activities using SVM multi-class classifierâ€™â€™, Pattern Recognition Letters, vol. 31, no. 2, pp.100-111, Jan. 2010.

G, Johansson, Visual motion perception. Scientific American, 232, pp. 76-88, 1975.

H. Wang, A. Klaser, C. Schmid, and L. Cheng-Lin, â€˜â€˜Action recognition by dense trajectoriesâ€™â€™, in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 3169-3176.

I. Laptev, and T. Lindeberg, â€˜â€˜Space-time interest pointsâ€™â€™, in Proc. of the IEEE International Conference on Computer Vision (ICCV), 2003, pp. 432-439.

P. Dollar, V. Rabaud, G. Cottrell, and S. Belongie, â€˜â€˜Behavior recognition via sparse spatio-temporal featuresâ€™â€™, in Proc. of the IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005.

T. H. Thi, J. Zhang, L. Cheng, L. Wang, and S. Satoh, â€˜â€˜Human action recognition and localization in video using structured learning of local space-time featuresâ€™â€™, in Proc. of the IEEE International Conference on Advanced Video and Signal Based Surveillance, 2010, pp. 204-211.

N. Ikizler-Cinbis, and S. Sclaroff, â€˜â€˜Object, scene and actions: combining multiple features for human action recognitionâ€™â€™, in Proc. of the European Conference on Computer vision (ECCV), 2010, pp. 494-507.

M. B. Holte, T.B. Moeslund, N. Nikolaidis, and I. Pitas, â€˜â€˜3d human action recognition for multi-view camera systemsâ€™â€™, in Proc. of the International Conference on 3D Imaging, Modeling, Processing and Transmission, 2011, pp. 342-349.

Z. Lin, Z. Jiang, and L. S. Davis, â€˜â€˜Recognizing actions by shapemotion prototype treesâ€™â€™, in Proc. of the IEEE 12th International Conference on Computer Vision, 2009, pp. 444-451.

J. Yamato, J. Ohya, and K. Ishii, â€˜â€˜Recognizing human action in time-sequential images using hidden Markov modelâ€™â€™, in Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR '92, 1992, pp. 379-385.

T. Starner, J. Weaver, and A. Pentland, â€˜â€˜Real-time American sign language recognition using desk and wearable computer based videoâ€™â€™, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 12, pp. 1371-1375, Dec. 1998.

A.F. Bobick, and A.D. Wilson, â€˜â€˜A state-based approach to the representation and recognition of gesturâ€™â€™, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 12, pp. 1325-1337, Dec. 1997.

E. Yu, and J. K. Aggarwal, â€˜â€˜Human Action Recognition with Extremities as Semantic Posture Representationâ€™â€™, IEEE CVPR Workshop on Semantic Learning and Applications in Multimedia, 2009, pp. 1-8.

Q. Shi, L. Cheng, L. Wang, and A. Smola, â€˜â€˜Human Action Segmentation and Recognition Using Discriminative Semi-Markov Modelsâ€™â€™, International Journal of Computer Vision, vol. 93, no.1, pp. 22-32, May. 2010.

Y. Shi, Y. Huang, D. Minnen, A. Bobick, and I. Essa, â€˜â€˜Propagation networks for recognition of partially ordered sequential actionâ€™â€™, in Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2004, pp. 862-869.

J. Yin, and Y. Meng, â€˜â€˜Human activity recognition in video using a hierarchical probabilistic latent modelâ€™â€™, in Proc. of IEEE Conference on Computer Vision and Pattern Recognition - Workshops, 2010, pp. 15-20.

L. Wang, Y. Wang, and W. Gao, â€˜â€˜Mining layered grammar rules for action recognition,â€™â€™ International Journal of Computer Vision, vol. 93, no. 2, pp. 162-182, Jun. 2010.

Moujahid, A. (2016). Retrieved online on November 15, 2016, from http://adilmoujahid.com/posts/2016/06/introduction-deep-learning-python-caffe/

A. Rikhtegar, M. Pooyan and M.T. Manzuri-Shalmani, â€˜â€˜Genetic algorithm-optimised structure of convolutional neural network for face recognition applications,â€™â€™ IET Computer Vision, vol. 10, no. 6, pp. 559-566, Sep. 2016.

C. Cortes, and V. Vapnik, â€˜â€˜Support-vector networks,â€™â€™ Machine Learning, vol. 20, no. 3, pp. 273â€“297, Sep. 1995.

B.E. Boser, I. Guyon, and V. Vapnik, â€˜â€˜A training algorithm for optimal margin classifiers,â€™â€™ in Proc. of the Fifth Annual Workshop on Computational Learning Theory, 1992, pp. 144 -152.

S. T. Ikram, and A. K. Cherukuri, â€˜â€˜Improving accuracy of intrusion detection model using PCA and optimized SVM,â€™â€™ Journal of Computing and Information Technology, vol. 24, no. 2, pp. 133â€“148, Jun. 2016.

Y. LeCun, K. Kavukcuoglu, and C. Farabet, Convolutional Networks and Applications in Vision, in Proc. of IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems (ISCAS), 2010, pp. 253-256.

S. Ji, W. Xu, M. Yang, and K. Yu, 3D Convolutional Neural Networks for Human Action Recognition, IEEE Transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221-231, Jan. 2013.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me