An automatic lip reading for short sentences using deep learning nets

(1) * Maha A Rajab Mail (University of Baghdad, Iraq)
(2) Kadhim M Hashim Mail (College of Information Technolongy, Imam Ja'afar AI-Sadiq University, Iraq)
*corresponding author

Abstract


One study whose importance has significantly grown in recent years is lip-reading, particularly with the widespread of using deep learning techniques. Lip reading is essential for speech recognition in noisy environments or for those with hearing impairments. It refers to recognizing spoken sentences using visual information acquired from lip movements. Also, the lip area, especially for males, suffers from several problems, such as the mouth area containing the mustache and beard, which may cover the lip area. This paper proposes an automatic lip-reading system to recognize and classify short English sentences spoken by speakers using deep learning networks. The input video extracts frames and each frame is passed to the Viola-Jones to detect the face area. Then 68 landmarks of the facial area are determined, and the landmarks from 48 to 68 represent the lip area extracted based on building a binary mask. Then, the contrast is enhanced to improve the quality of the lip image by applying contrast adjustment. Finally, sentences are classified using two deep learning models, the first is AlexNet, and the second is VGG-16 Net. The database consists of 39 participants (32 males and 7 females). Each participant repeats the short sentences five times. The outcomes demonstrate the accuracy rate of AlexNet is 90.00%, whereas the accuracy rate for VGG-16 Net is 82.34%. We concluded that AlexNet performs better for classifying short sentences than VGG-16 Net.

Keywords


Lip Reading; CNN; AlexNet; VGG-16 Net; Short Sentences

   

DOI

https://doi.org/10.26555/ijain.v9i1.920
      

Article metrics

Abstract views : 942 | PDF views : 322

   

Cite

   

Full Text

Download

References


[1] N. Akhter et al., “Diverse Pose Lip-Reading Framework,” Appl. Sci. 2022, Vol. 12, Page 9532, vol. 12, no. 19, p. 9532, Sep. 2022, doi: 10.3390/APP12199532.

[2] Y. Lu and H. Li, “Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory,” Appl. Sci. 2019, Vol. 9, Page 1599, vol. 9, no. 8, p. 1599, Apr. 2019, doi: 10.3390/APP9081599.

[3] S. Fenghour, D. Chen, K. Guo, and P. Xiao, “Lip Reading Sentences Using Deep Learning with only Visual Cues,” IEEE Access, vol. 8, pp. 215516–215530, 2020, doi: 10.1109/ACCESS.2020.3040906.

[4] K. Srilakshmi and R. Karthik, “A Novel Method for Lip Movement Detection using Deep Neural Network,” J. Sci. Ind. Res., vol. 81, no. 06, pp. 643–650, Jun. 2022, doi: 10.56042/JSIR.V81I06.53898.

[5] T. OZCAN and A. BASTURK, “Lip Reading Using Convolutional Neural Networks with and without Pre-Trained Models,” Balk. J. Electr. Comput. Eng., vol. 7, no. 2, pp. 195–201, Apr. 2019, doi: 10.17694/BAJECE.479891.

[6] H. Huang et al., “A Novel Machine Lip Reading Model,” Procedia Comput. Sci., vol. 199, pp. 1432–1437, Jan. 2022, doi: 10.1016/J.PROCS.2022.01.181.

[7] M. Hao, M. Mamut, N. Yadikar, A. Aysa, and K. Ubul, “A survey of research on lipreading technology,” IEEE Access, vol. 8, pp. 204518–204544, 2020, doi: 10.1109/ACCESS.2020.3036865.

[8] A. Pyataeva and A. Dzyuba, “Artificial neural network technology for lips reading,” E3S Web Conf., vol. 333, p. 01009, 2021, doi: 10.1051/E3SCONF/202133301009.

[9] S. Jeon, A. Elsharkawy, and M. S. Kim, “Lipreading Architecture Based on Multiple Convolutional Neural Networks for Sentence-Level Visual Speech Recognition,” Sensors 2022, Vol. 22, Page 72, vol. 22, no. 1, p. 72, Dec. 2021, doi: 10.3390/S22010072.

[10] N. Deshmukh, A. Ahire, S. H. Bhandari, A. Mali, and K. Warkari, “Vision based Lip Reading System using Deep Learning,” 2021 Int. Conf. Comput. Commun. Green Eng. CCGE 2021, 2021, doi: 10.1109/CCGE50943.2021.9776430.

[11] Ü. Atila and F. Sabaz, “Turkish lip-reading using Bi-LSTM and deep learning models,” Eng. Sci. Technol. an Int. J., vol. 35, p. 101206, Nov. 2022, doi: 10.1016/J.JESTCH.2022.101206.

[12] R. Shashidhar and S. Patilkulkarni, “Visual speech recognition for small scale dataset using VGG16 convolution neural network,” Multimed. Tools Appl., vol. 80, no. 19, pp. 28941–28952, Aug. 2021, doi: 10.1007/S11042-021-11119-0/METRICS.

[13] Z. M. Chan, C. Y. Lau, and K. F. Thang, “Visual speech recognition of lips images using convolutional neural network in vgg-m model,” J. Inf. Hiding Multimed. Signal Process., vol. 11, no. 3, pp. 116–125, 2020.[Online]. Available: http://bit.kuas.edu.tw/~jihmsp/2020/vol1/2_jihmsp-1522_vol3.pdf

[14] G. Pooventhiran, A. Sandeep, K. Manthiravalli, D. Harish, and R. D. Karthika, “Speaker-Independent Speech Recognition using Visual Features,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 11, pp. 616–620, 2020, doi: 10.14569/IJACSA.2020.0111175.

[15] S. Petridis, J. Shen, D. Cetin, and M. Pantic, “Visual-Only Recognition of Normal, Whispered and Silent Speech,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2018-April, pp. 6219–6223, Sep. 2018, doi: 10.1109/ICASSP.2018.8461596.

[16] T. H. Obaida, N. F. Hassan, and A. S. Jamil, “Comparative of Viola-Jones and YOLO v3 for Face Detection in Real time,” vol. 22, no. 2, pp. 63–72, 2022. [Online]. Available: https://iraqjournals.com/article_175825_0.html

[17] Dherya Bengani and Prof. Vasudha Bah, “Face Detection Using Viola Jones Algorithm,” Int. J. Mod. Trends Sci. Technol., vol. 6, no. 11, pp. 131–134, 2020, doi: 10.46501/ijmtst061124.

[18] I. T. Ahmed, C. S. Der, N. Jamil, and M. A. Mohamed, “Improve of contrast-distorted image quality assessment based on convolutional neural networks,” Int. J. Electr. Comput. Eng., vol. 9, no. 6, pp. 5604–5614, Dec. 2019, doi: 10.11591/IJECE.V9I6.PP5604-5614.

[19] S. Bianco, L. Celona, P. Napoletano, and R. Schettini, “On the use of deep learning for blind image quality assessment,” Signal, Image Video Process., vol. 12, no. 2, pp. 355–362, Feb. 2018, doi: 10.1007/s11760-017-1166-8.

[20] M. M. Krishna, M. Neelima, M. Harshali, and M. V. G. Rao, “Image classification using Deep learning,” Int. J. Eng. Technol., vol. 7, no. 2.7, pp. 614–617, Mar. 2018, doi: 10.14419/IJET.V7I2.7.10892.

[21] P. Haripriya, “Deep learning pre-trained architecture of alex net and googlenet for DICOM image classification,” Int. J. Sci. Technol. Res., vol. 8, no. 11, pp. 3107–3113, 2019.[Online].Available: http://www.ijstr.org/final-print/nov2019/Deep-Learning-Pre-trained-Architecture-Of-Alex-Net-And-Googlenet-For-Dicom-Image-Classification.pdf

[22] W. Ketwongsa, S. Boonlue, and U. Kokaew, “A New Deep Learning Model for the Classification of Poisonous and Edible Mushrooms Based on Improved AlexNet Convolutional Neural Network,” Appl. Sci. 2022, Vol. 12, Page 3409, vol. 12, no. 7, p. 3409, Mar. 2022, doi: 10.3390/APP12073409.

[23] F. D. Adhinata, N. A. F. Tanjung, W. Widayat, G. R. Pasfica, and F. R. Satura, “Comparative Study of VGG16 and MobileNetV2 for Masked Face Recognition,” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 7, no. 2, pp. 230–237, Jul. 2021, doi: 10.26555/JITEKI.V7I2.20758.

[24] Q. Guan et al., “Deep convolutional neural network VGG-16 model for differential diagnosing of papillary thyroid carcinomas in cytological images: a pilot study,” J. Cancer, vol. 10, no. 20, pp. 4876–4882, 2019, doi: 10.7150/JCA.28769.

[25] M. A. Rajab and K. M. Hashim, “Dorsal hand veins features extraction and recognition by correlation coefficient,” TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 20, no. 4, pp. 867–874, Aug. 2022, doi: 10.12928/TELKOMNIKA.V20I4.22068.

[26] M. Arhami, A. Desiani, S. Yahdin, A. I. Putri, R. Primartha, and H. Husaini, “Contrast enhancement for improved blood vessels retinal segmentation using top-hat transformation and otsu thresholding,” Int. J. Adv. Intell. Informatics, vol. 8, no. 2, pp. 210–223, Jul. 2022, doi: 10.26555/ijain.v8i2.779.

[27] S. Ghosh et al., “Evaluation and Optimization of Biomedical Image-Based Deep Convolutional Neural Network Model for COVID-19 Status Classification,” Appl. Sci., vol. 12, no. 21, p. 10787, Oct. 2022, doi: 10.3390/APP122110787/S1.

[28] A. A. Azmer, N. Hassan, S. H. Khaleefah, S. A. Mostafa, and A. A. Ramli, “Comparative analysis of classification techniques for leaves and land cover texture,” Int. J. Adv. Intell. Informatics, vol. 7, no. 3, pp. 357–367, Nov. 2021, doi: 10.26555/ijain.v7i3.706.

[29] M. J. J. Ghrabat, G. Ma, I. Y. Maolood, S. S. Alresheedi, and Z. A. Abduljabbar, “An effective image retrieval based on optimized genetic algorithm utilized a novel SVM-based convolutional neural network classifier,” Human-centric Comput. Inf. Sci., vol. 9, no. 1, pp. 1–29, Dec. 2019, doi: 10.1186/S13673-019-0191-8/FIGURES/20.

[30] A. E. Maxwell, T. A. Warner, and L. A. Guillén, “Accuracy Assessment in Convolutional Neural Network-Based Deep Learning Remote Sensing Studies—Part 1: Literature Review,” Remote Sens. 2021, Vol. 13, Page 2450, vol. 13, no. 13, p. 2450, Jun. 2021, doi: 10.3390/RS13132450.

[31] R. Ruslan, S. Khairunniza-Bejo, M. Jahari, and M. F. Ibrahim, “Weedy Rice Classification Using Image Processing and a Machine Learning Approach,” Agric. 2022, Vol. 12, Page 645, vol. 12, no. 5, p. 645, Apr. 2022, doi: 10.3390/AGRICULTURE12050645.

[32] D. Müller, I. Soto-Rey, and F. Kramer, “Towards a guideline for evaluation metrics in medical image segmentation,” BMC Res. Notes, vol. 15, no. 1, pp. 1–8, Dec. 2022, doi: 10.1186/S13104-022-06096-Y/FIGURES/2.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
   andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0