Who danced better? ranked tiktok dance video dataset and pairwise action quality assessment method

(1) * Irwandi Hipiny Mail (Universiti Malaysia Sarawak, Malaysia)
(2) Hamimah Ujir Mail (Universiti Malaysia Sarawak, Malaysia)
(3) Aidil Azli Alias Mail (Universiti Malaysia Sarawak, Malaysia)
(4) Musdi Shanat Mail (Universiti Malaysia Sarawak, Malaysia)
(5) Mohamad Khairi Ishak Mail (Universiti Sains Malaysia, Malaysia)
*corresponding author


Video-based action quality assessment (AQA) is a non-trivial task due to the subtle visual differences between data produced by experts and non-experts. Current methods are extended from the action recognition domain where most are based on temporal pattern matching. AQA has additional requirements where order and tempo matter for rating the quality of an action. We present a novel dataset of ranked TikTok dance videos, and a pairwise AQA method for predicting which video of a same-label pair was sourced from the better dancer. Exhaustive pairings of same-label videos were randomly assigned to 100 human annotators, ultimately producing a ranked list per label category. Our method relies on a successful detection of the subject’s 2D pose inside successive query frames where the order and tempo of actions are encoded inside a produced String sequence. The detected 2D pose returns a top-matching Visual word from a Codebook to represent the current frame. Given a same-label pair, we generate a String value of concatenated Visual words for each video. By computing the edit distance score between each String value and the Gold Standard’s (i.e., the top-ranked video(s) for that label category), we declare the video with the lower score as the winner. The pairwise AQA method is implemented using two schemes, i.e., with and without text compression. Although the average precision for both schemes over 12 label categories is low, at 0.45 with text compression and 0.48 without, precision values for several label categories are comparable to past methods’ (median: 0.47, max: 0.66).


Action Quality Assessment; Dance Video Dataset; Human Activity Analysis; String Matching; Visual Codebook




Article metrics

Abstract views : 266 | PDF views : 52




Full Text



[1] E. Roque Rodríguez, “Youtube tutorials as a non-formal learning strategy for university students,” RIDE. Rev. Iberoam. para la Investig. y el Desarro. Educ., vol. 11, no. 21, p. 153, Dec. 2020, doi: 10.23913/RIDE.V11I21.797.

[2] L. Ceci, “Top categories on TikTok by hashtag views 2020 | Statista,” Statista. Available at : Statista.

[3] J. Lin, T. Yu, and Z. J. Wang, “Rethinking Crowdsourcing Annotation: Partial Annotation with Salient Labels for Multilabel Aerial Image Classification,” IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, doi: 10.1109/TGRS.2022.3191735.

[4] G. Dawson and R. Polikar, “OpinionRank: Extracting Ground Truth Labels from Unreliable Expert Opinions with Graph-Based Spectral Ranking,” Proc. Int. Jt. Conf. Neural Networks, vol. 2021-July, Jul. 2021, doi: 10.1109/IJCNN52387.2021.9533320.

[5] P. Dollár, V. Rabaud, G. Cottrell, and S. Belongie, “Behavior recognition via sparse spatio-temporal features,” Proc. - 2nd Jt. IEEE Int. Work. Vis. Surveill. Perform. Eval. Track. Surveillance, VS-PETS, vol. 2005, pp. 65–72, 2005, doi: 10.1109/VSPETS.2005.1570899.

[6] H. Wang, A. Kläser, C. Schmid, and C. L. Liu, “Dense trajectories and motion boundary descriptors for action recognition,” Int. J. Comput. Vis., vol. 103, no. 1, pp. 60–79, May 2013, doi: 10.1007/S11263-012-0594-8.

[7] I. Hipiny and H. Ujir, “Measuring task performance using gaze regions,” 2015 9th Int. Conf. IT Asia Transform. Big Data into Knowledge, CITA 2015 - Proc., Dec. 2015, doi: 10.1109/CITA.2015.7349836.

[8] A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, “A survey of the recent architectures of deep convolutional neural networks,” Artif. Intell. Rev. 2020 538, vol. 53, no. 8, pp. 5455–5516, Apr. 2020, doi: 10.1007/S10462-020-09825-6.

[9] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects,” IEEE Trans. Neural Networks Learn. Syst., vol. 33, no. 12, pp. 6999–7019, Dec. 2022, doi: 10.1109/TNNLS.2021.3084827.

[10] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and F. F. Li, “Large-scale video classification with convolutional neural networks,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 1725–1732, Sep. 2014, doi: 10.1109/CVPR.2014.223.

[11] C. Feichtenhofer, H. Fan, J. Malik, and K. He, “Slowfast networks for video recognition,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-October, pp. 6201–6210, Oct. 2019, doi: 10.1109/ICCV.2019.00630.

[12] K. Simonyan and A. Zisserman, “Two-Stream Convolutional Networks for Action Recognition in Videos,” Adv. Neural Inf. Process. Syst., vol. 1, no. January, pp. 568–576, Jun. 2014, doi:10.48550/arXiv.1406.2199.

[13] X. Gong, H. Wang, Z. Shou, M. Feiszli, Z. Wang, and Z. Yan, “Searching for Two-Stream Models in Multivariate Space for Video Recognition,” Proc. IEEE Int. Conf. Comput. Vis., pp. 8013–8022, 2021, doi: 10.1109/ICCV48922.2021.00793.

[14] D. Castro et al., “Let’s Dance: Learning From Online Dance Videos,” Jan. 2018, doi: 10.48550/arXiv.1406.2199.

[15] S. Tsuchida, S. Fukayama, M. Hamasaki, and M. Goto, “AIST Dance Video Database: Multi-Genre, Multi-Dancer, and Multi-Camera Database for Dance Information Processing,” Nov. 2019, doi: 10.5281/zenodo.3527853.

[16] M. Wysoczańska and T. Trzciński, “Multimodal dance recognition,” VISIGRAPP 2020 - Proc. 15th Int. Jt. Conf. Comput. Vision, Imaging Comput. Graph. Theory Appl., vol. 5, pp. 558–565, 2020, doi: 10.5220/0009326005580565.

[17] C. Liu, L. Feng, G. Liu, H. Wang, and S. Liu, “Bottom-up broadcast neural network for music genre classification,” Multimed. Tools Appl., vol. 80, no. 5, pp. 7313–7331, Feb. 2021, doi: 10.1007/S11042-020-09643-6.

[18] X. Hu and N. Ahuja, “Unsupervised 3D Pose Estimation for Hierarchical Dance Video Recognition,” Proc. IEEE Int. Conf. Comput. Vis., pp. 10995–11004, 2021, doi: 10.1109/ICCV48922.2021.01083.

[19] H. Pirsiavash, C. Vondrick, and A. Torralba, “Assessing the quality of actions,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 8694 LNCS, no. PART 6, pp. 556–571, 2014, doi: 10.1007/978-3-319-10599-4_36.

[20] P. Parmar and B. T. Morris, “What and how well you performed? a multitask learning approach to action quality assessment,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, pp. 304–313, Jun. 2019, doi: 10.1109/CVPR.2019.00039.

[21] C. Xu, Y. Fu, B. Zhang, Z. Chen, Y. G. Jiang, and X. Xue, “Learning to Score Figure Skating Sport Videos,” IEEE Trans. Circuits Syst. Video Technol., vol. 30, no. 12, pp. 4578–4590, Dec. 2020, doi: 10.1109/TCSVT.2019.2927118.

[22] J. H. Pan, J. Gao, and W. S. Zheng, “Action assessment by joint relation graphs,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2019-October, pp. 6330–6339, Oct. 2019, doi: 10.1109/ICCV.2019.00643.

[23] Y. Tang et al., “Uncertainty-Aware Score Distribution Learning for Action Quality Assessment,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 9836–9845, 2020, doi: 10.1109/CVPR42600.2020.00986.

[24] X. Yu, Y. Rao, W. Zhao, J. Lu, and J. Zhou, “Group-aware Contrastive Regression for Action Quality Assessment,” Proc. IEEE Int. Conf. Comput. Vis., pp. 7899–7908, 2021, doi: 10.1109/ICCV48922.2021.00782.

[25] H. Doughty, W. Mayol-Cuevas, and Di. Damen, “The pros and cons: Rank-aware temporal attention for skill determination in long videos,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, pp. 7854–7863, Jun. 2019, doi: 10.1109/CVPR.2019.00805.

[26] F. John, I. Hipiny, and H. Ujir, “Assessing performance of aerobic routines using background subtraction and intersected image region,” Proc. - 2019 Int. Conf. Comput. Drone Appl. IConDA 2019, pp. 38–41, Dec. 2019, doi: 10.1109/ICONDA47345.2019.9034912.

[27] S. Dewan, S. Agarwal, and N. Singh, “A deep learning pipeline for Indian dance style classification,”, vol. 10696, pp. 265–273, Apr. 2018, doi: 10.1117/12.2309445.

[28] B. Li, Y. Zhao, Z. Shi, and L. Sheng, “DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer,” Proc. AAAI Conf. Artif. Intell., vol. 36, no. 2, pp. 1272–1279, Jun. 2022, doi: 10.1609/AAAI.V36I2.20014.

[29] H. Matsuyama, K. Hiroi, K. Kaji, T. Yonezawa, and N. Kawaguchi, “Hybrid activity recognition for ballroom dance exercise using video and wearable sensor,” 2019 Jt. 8th Int. Conf. Informatics, Electron. Vision, ICIEV 2019 3rd Int. Conf. Imaging, Vis. Pattern Recognition, icIVPR 2019 with Int. Conf. Act. Behav. Comput. ABC 2019, pp. 112–117, May 2019, doi: 10.1109/ICIEV.2019.8858524.

[30] H. Bhuyan, J. Killi, J. K. Dash, P. P. Das, and S. Paul, “Motion Recognition in Bharatanatyam Dance,” IEEE Access, vol. 10, pp. 67128–67139, 2022, doi: 10.1109/ACCESS.2022.3184735.

[31] M. Ma, S. Sun, and Y. Gao, “Data-Driven Computer Choreography Based on Kinect and 3D Technology,” Sci. Program., vol. 2022, 2022, doi: 10.1155/2022/2352024.

[32] G. Papandreou, T. Zhu, L. C. Chen, S. Gidaris, J. Tompson, and K. Murphy, “Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11218 LNCS, pp. 282–299, 2018, doi: 10.1007/978-3-030-01264-9_17.

[33] H. Jain, G. Harit, and A. Sharma, “Action Quality Assessment Using Siamese Network-Based Deep Metric Learning,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 6, pp. 2260–2273, Jun. 2021, doi:10.1109/TCSVT.2020.3017727.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
   andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0