Who danced better? ranked tiktok dance video dataset and pairwise action quality assessment method

Irwandi Hipiny; Hamimah Ujir; Aidil Azli Alias; Musdi Shanat; Mohamad Khairi Ishak

doi:10.26555/ijain.v9i1.919


Who danced better? ranked tiktok dance video dataset and pairwise action quality assessment method

^{(1) *} Irwandi Hipiny

(Universiti Malaysia Sarawak, Malaysia)
⁽²⁾ Hamimah Ujir

(Universiti Malaysia Sarawak, Malaysia)
⁽³⁾ Aidil Azli Alias

(Universiti Malaysia Sarawak, Malaysia)
⁽⁴⁾ Musdi Shanat

(Universiti Malaysia Sarawak, Malaysia)
⁽⁵⁾ Mohamad Khairi Ishak

(Universiti Sains Malaysia, Malaysia)
^*corresponding author

Abstract

Video-based action quality assessment (AQA) is a non-trivial task due to the subtle visual differences between data produced by experts and non-experts. Current methods are extended from the action recognition domain where most are based on temporal pattern matching. AQA has additional requirements where order and tempo matter for rating the quality of an action. We present a novel dataset of ranked TikTok dance videos, and a pairwise AQA method for predicting which video of a same-label pair was sourced from the better dancer. Exhaustive pairings of same-label videos were randomly assigned to 100 human annotators, ultimately producing a ranked list per label category. Our method relies on a successful detection of the subjectâ€™s 2D pose inside successive query frames where the order and tempo of actions are encoded inside a produced String sequence. The detected 2D pose returns a top-matching Visual word from a Codebook to represent the current frame. Given a same-label pair, we generate a String value of concatenated Visual words for each video. By computing the edit distance score between each String value and the Gold Standardâ€™s (i.e., the top-ranked video(s) for that label category), we declare the video with the lower score as the winner. The pairwise AQA method is implemented using two schemes, i.e., with and without text compression. Although the average precision for both schemes over 12 label categories is low, at 0.45 with text compression and 0.48 without, precision values for several label categories are comparable to past methodsâ€™ (median: 0.47, max: 0.66).

Keywords

Action Quality Assessment; Dance Video Dataset; Human Activity Analysis; String Matching; Visual Codebook

DOI

https://doi.org/10.26555/ijain.v9i1.919

Article metrics

Abstract views : 2225 | PDF views : 204

Cite

How to cite item

Full Text

Download

References

[1] E. Roque RodrÃguez, â€œYoutube tutorials as a non-formal learning strategy for university students,â€ RIDE. Rev. Iberoam. para la Investig. y el Desarro. Educ., vol. 11, no. 21, p. 153, Dec. 2020, doi: 10.23913/RIDE.V11I21.797.

[2] L. Ceci, â€œTop categories on TikTok by hashtag views 2020 | Statista,â€ Statista. Available at : Statista.

[3] J. Lin, T. Yu, and Z. J. Wang, â€œRethinking Crowdsourcing Annotation: Partial Annotation with Salient Labels for Multilabel Aerial Image Classification,â€ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, doi: 10.1109/TGRS.2022.3191735.

[4] G. Dawson and R. Polikar, â€œOpinionRank: Extracting Ground Truth Labels from Unreliable Expert Opinions with Graph-Based Spectral Ranking,â€ Proc. Int. Jt. Conf. Neural Networks, vol. 2021-July, Jul. 2021, doi: 10.1109/IJCNN52387.2021.9533320.

[5] P. DollÃ¡r, V. Rabaud, G. Cottrell, and S. Belongie, â€œBehavior recognition via sparse spatio-temporal features,â€ Proc. - 2nd Jt. IEEE Int. Work. Vis. Surveill. Perform. Eval. Track. Surveillance, VS-PETS, vol. 2005, pp. 65â€“72, 2005, doi: 10.1109/VSPETS.2005.1570899.

[6] H. Wang, A. KlÃ¤ser, C. Schmid, and C. L. Liu, â€œDense trajectories and motion boundary descriptors for action recognition,â€ Int. J. Comput. Vis., vol. 103, no. 1, pp. 60â€“79, May 2013, doi: 10.1007/S11263-012-0594-8.

[7] I. Hipiny and H. Ujir, â€œMeasuring task performance using gaze regions,â€ 2015 9th Int. Conf. IT Asia Transform. Big Data into Knowledge, CITA 2015 - Proc., Dec. 2015, doi: 10.1109/CITA.2015.7349836.

[8] A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, â€œA survey of the recent architectures of deep convolutional neural networks,â€ Artif. Intell. Rev. 2020 538, vol. 53, no. 8, pp. 5455â€“5516, Apr. 2020, doi: 10.1007/S10462-020-09825-6.

[9] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, â€œA Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects,â€ IEEE Trans. Neural Networks Learn. Syst., vol. 33, no. 12, pp. 6999â€“7019, Dec. 2022, doi: 10.1109/TNNLS.2021.3084827.

[10] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and F. F. Li, â€œLarge-scale video classification with convolutional neural networks,â€ Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 1725â€“1732, Sep. 2014, doi: 10.1109/CVPR.2014.223.

[11] C. Feichtenhofer, H. Fan, J. Malik, and K. He, â€œSlowfast networks for video recognition,â€ Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-October, pp. 6201â€“6210, Oct. 2019, doi: 10.1109/ICCV.2019.00630.

[12] K. Simonyan and A. Zisserman, â€œTwo-Stream Convolutional Networks for Action Recognition in Videos,â€ Adv. Neural Inf. Process. Syst., vol. 1, no. January, pp. 568â€“576, Jun. 2014, doi:10.48550/arXiv.1406.2199.

[13] X. Gong, H. Wang, Z. Shou, M. Feiszli, Z. Wang, and Z. Yan, â€œSearching for Two-Stream Models in Multivariate Space for Video Recognition,â€ Proc. IEEE Int. Conf. Comput. Vis., pp. 8013â€“8022, 2021, doi: 10.1109/ICCV48922.2021.00793.

[14] D. Castro et al., â€œLetâ€™s Dance: Learning From Online Dance Videos,â€ Jan. 2018, doi: 10.48550/arXiv.1406.2199.

[15] S. Tsuchida, S. Fukayama, M. Hamasaki, and M. Goto, â€œAIST Dance Video Database: Multi-Genre, Multi-Dancer, and Multi-Camera Database for Dance Information Processing,â€ Nov. 2019, doi: 10.5281/zenodo.3527853.

[16] M. WysoczaÅ„ska and T. TrzciÅ„ski, â€œMultimodal dance recognition,â€ VISIGRAPP 2020 - Proc. 15th Int. Jt. Conf. Comput. Vision, Imaging Comput. Graph. Theory Appl., vol. 5, pp. 558â€“565, 2020, doi: 10.5220/0009326005580565.

[17] C. Liu, L. Feng, G. Liu, H. Wang, and S. Liu, â€œBottom-up broadcast neural network for music genre classification,â€ Multimed. Tools Appl., vol. 80, no. 5, pp. 7313â€“7331, Feb. 2021, doi: 10.1007/S11042-020-09643-6.

[18] X. Hu and N. Ahuja, â€œUnsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition,â€ Proc. IEEE Int. Conf. Comput. Vis., pp. 10995â€“11004, 2021, doi: 10.1109/ICCV48922.2021.01083.

[19] H. Pirsiavash, C. Vondrick, and A. Torralba, â€œAssessing the quality of actions,â€ Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 8694 LNCS, no. PART 6, pp. 556â€“571, 2014, doi: 10.1007/978-3-319-10599-4_36.

[20] P. Parmar and B. T. Morris, â€œWhat and how well you performed? a multitask learning approach to action quality assessment,â€ Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, pp. 304â€“313, Jun. 2019, doi: 10.1109/CVPR.2019.00039.

[21] C. Xu, Y. Fu, B. Zhang, Z. Chen, Y. G. Jiang, and X. Xue, â€œLearning to Score Figure Skating Sport Videos,â€ IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 12, pp. 4578â€“4590, Dec. 2020, doi: 10.1109/TCSVT.2019.2927118.

[22] J. H. Pan, J. Gao, and W. S. Zheng, â€œAction assessment by joint relation graphs,â€ Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-October, pp. 6330â€“6339, Oct. 2019, doi: 10.1109/ICCV.2019.00643.

[23] Y. Tang et al., â€œUncertainty-Aware Score Distribution Learning for Action Quality Assessment,â€ Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 9836â€“9845, 2020, doi: 10.1109/CVPR42600.2020.00986.

[24] X. Yu, Y. Rao, W. Zhao, J. Lu, and J. Zhou, â€œGroup-aware Contrastive Regression for Action Quality Assessment,â€ Proc. IEEE Int. Conf. Comput. Vis., pp. 7899â€“7908, 2021, doi: 10.1109/ICCV48922.2021.00782.

[25] H. Doughty, W. Mayol-Cuevas, and Di. Damen, â€œThe pros and cons: Rank-aware temporal attention for skill determination in long videos,â€ Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, pp. 7854â€“7863, Jun. 2019, doi: 10.1109/CVPR.2019.00805.

[26] F. John, I. Hipiny, and H. Ujir, â€œAssessing performance of aerobic routines using background subtraction and intersected image region,â€ Proc. - 2019 Int. Conf. Comput. Drone Appl. IConDA 2019, pp. 38â€“41, Dec. 2019, doi: 10.1109/ICONDA47345.2019.9034912.

[27] S. Dewan, S. Agarwal, and N. Singh, â€œA deep learning pipeline for Indian dance style classification,â€, vol. 10696, pp. 265â€“273, Apr. 2018, doi: 10.1117/12.2309445.

[28] B. Li, Y. Zhao, Z. Shi, and L. Sheng, â€œDanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer,â€ Proc. AAAI Conf. Artif. Intell., vol. 36, no. 2, pp. 1272â€“1279, Jun. 2022, doi: 10.1609/AAAI.V36I2.20014.

[29] H. Matsuyama, K. Hiroi, K. Kaji, T. Yonezawa, and N. Kawaguchi, â€œHybrid activity recognition for ballroom dance exercise using video and wearable sensor,â€ 2019 Jt. 8th Int. Conf. Informatics, Electron. Vision, ICIEV 2019 3rd Int. Conf. Imaging, Vis. Pattern Recognition, icIVPR 2019 with Int. Conf. Act. Behav. Comput. ABC 2019, pp. 112â€“117, May 2019, doi: 10.1109/ICIEV.2019.8858524.

[30] H. Bhuyan, J. Killi, J. K. Dash, P. P. Das, and S. Paul, â€œMotion Recognition in Bharatanatyam Dance,â€ IEEE Access, vol. 10, pp. 67128â€“67139, 2022, doi: 10.1109/ACCESS.2022.3184735.

[31] M. Ma, S. Sun, and Y. Gao, â€œData-Driven Computer Choreography Based on Kinect and 3D Technology,â€ Sci. Program., vol. 2022, 2022, doi: 10.1155/2022/2352024.

[32] G. Papandreou, T. Zhu, L. C. Chen, S. Gidaris, J. Tompson, and K. Murphy, â€œPersonlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model,â€ Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11218 LNCS, pp. 282â€“299, 2018, doi: 10.1007/978-3-030-01264-9_17.

[33] H. Jain, G. Harit, and A. Sharma, â€œAction Quality Assessment Using Siamese Network-Based Deep Metric Learning,â€ IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 6, pp. 2260â€“2273, Jun. 2021, doi:10.1109/TCSVT.2020.3017727.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me