Bottom-up visual attention model for still image: a preliminary study

(1) * Adhi Prahara Mail (Universitas Ahmad Dahlan, Indonesia)
(2) Murinto Murinto Mail (Universitas Ahmad Dahlan, Indonesia)
(3) Dewi Pramudi Ismi Mail (Universitas Ahmad Dahlan, Indonesia)
*corresponding author

Abstract


The philosophy of human visual attention is scientifically explained in the field of cognitive psychology and neuroscience then computationally modeled in the field of computer science and engineering. Visual attention models have been applied in computer vision systems such as object detection, object recognition, image segmentation, image and video compression, action recognition, visual tracking, and so on. This work studies bottom-up visual attention, namely human fixation prediction and salient object detection models. The preliminary study briefly covers from the biological perspective of visual attention, including visual pathway, the theory of visual attention, to the computational model of bottom-up visual attention that generates saliency map. The study compares some models at each stage and observes whether the stage is inspired by biological architecture, concept, or behavior of human visual attention. From the study, the use of low-level features, center-surround mechanism, sparse representation, and higher-level guidance with intrinsic cues dominate the bottom-up visual attention approaches. The study also highlights the correlation between bottom-up visual attention and curiosity.

Keywords


Visual attention; Bottom-up attention; Saliency map; Computer vision; Curiosity

   

DOI

https://doi.org/10.26555/ijain.v6i1.469
      

Article metrics

Abstract views : 307 | PDF views : 43

   

Cite

   

Full Text

Download

References


[1] T. Huang, “Computer vision: Evolution and promise,” 1996, available at: Google Scholar.

[2] S. Tanimoto, A. Buizza, C. A. Marzi, M. Savini, and S. Vitulano, “Panel Summary Allocation of Attention in Vision,” in Human and Machine Vision, Boston, MA: Springer US, 1994, pp. 171–180, doi: 10.1007/978-1-4899-1004-2_12.

[3] A. L. Yarbus, Eye Movements and Vision. Boston, MA: Springer US, 1967, available at: Google Scholar.

[4] A. M. Treisman and G. Gelade, “A feature-integration theory of attention,” Cogn. Psychol., vol. 12, no. 1, pp. 97–136, Jan. 1980, doi: 10.1016/0010-0285(80)90005-5.

[5] H. E. Egeth, R. A. Virzi, and H. Garbart, “Searching for conjunctively defined targets,” J. Exp. Psychol. Hum. Percept. Perform., vol. 10, no. 1, pp. 32–39, 1984, doi: 10.1037/0096-1523.10.1.32.

[6] K. Nakayama and M. Mackeben, “Sustained and transient components of focal visual attention,” Vision Res., vol. 29, no. 11, pp. 1631–1647, Jan. 1989, doi: 10.1016/0042-6989(89)90144-2.

[7] J. Duncan and G. W. Humphreys, “Visual search and stimulus similarity,” Psychol. Rev., vol. 96, no. 3, p. 433, 1989, available at: Google Scholar.

[8] H. J. Müller and P. M. A. Rabbitt, “Spatial Cueing and the Relation between the Accuracy of ‘Where’ and ‘What’ Decisions in Visual Search,” Q. J. Exp. Psychol. Sect. A, vol. 41, no. 4, pp. 747–773, Nov. 1989, doi: 10.1080/14640748908402392.

[9] E. J. Tehovnik, W. M. Slocum, and P. H. Schiller, “Saccadic eye movements evoked by microstimulation of striate cortex,” Eur. J. Neurosci., vol. 17, no. 4, pp. 870–878, Feb. 2003, doi: 10.1046/j.1460-9568.2003.02489.x.

[10] X. Zhang, L. Zhaoping, T. Zhou, and F. Fang, “Neural Activities in V1 Create a Bottom-Up Saliency Map,” Neuron, vol. 73, no. 1, pp. 183–192, Jan. 2012, doi: 10.1016/J.NEURON.2011.10.035.

[11] U. Neisser, Cognitive Psychology. Psychology Press, 2014, available at: Google Scholar.

[12] C. Koch and S. Ullman, “Shifts in Selective Visual Attention: Towards the Underlying Neural Circuitry,” in Matters of Intelligence, Dordrecht: Springer Netherlands, 1987, pp. 115–141, doi: 10.1007/978-94-009-3833-5_5.

[13] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 11, pp. 1254–1259, 1998, doi: 10.1109/34.730558.

[14] E. Aminoff, “Physiology – From Retina to V1 to High-Level Areas,” available at: http://graphics.cs.cmu.edu/courses/16-899A/2014_spring/thevisualworld/3.pdf (accessed Jul. 14, 2019).

[15] S. Frintrop, “Background on Visual Attention,” Springer, Berlin, Heidelberg, 2006, pp. 7–31, available at: Google Scholar.

[16] S. E. Palmer, Vision science: Photons to phenomenology. MIT press, 1999, available at: Google Scholar.

[17] L. Lombardi and M. Porta, “Log-Map Analysis,” in Visual Attention Mechanisms, Boston, MA: Springer US, 2002, pp. 41–51, doi: 10.1007/978-1-4615-0111-4_4.

[18] U. Schiefer and W. Hart, “Functional Anatomy of the Human Visual Pathway,” in Clinical Neuro-Ophthalmology, Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 19–28, doi: 10.1007/978-3-540-32708-0_3.

[19] C. A. Marzi, “Visual Attention And the Parallel Visual Pathways,” in Visual Attention Mechanisms, Boston, MA: Springer US, 2002, pp. 1–6, doi: 10.1007/978-1-4615-0111-4_1.

[20] M. D. Binder, N. Hirokawa, and U. Windhorst, Encyclopedia of neuroscience. Springer, 2009, avilable at: Google Scholar.

[21] Z. Li, “A saliency map in primary visual cortex,” Trends Cogn. Sci., vol. 6, no. 1, pp. 9–16, Jan. 2002, doi: 10.1016/S1364-6613(00)01817-9.

[22] D. J. Simons, “Attentional capture and inattentional blindness,” Trends Cogn. Sci., vol. 4, no. 4, pp. 147–155, Apr. 2000, doi: 10.1016/S1364-6613(00)01455-8.

[23] D. J. Simons and R. A. Rensink, “Change blindness: past, present, and future,” Trends Cogn. Sci., vol. 9, no. 1, pp. 16–20, Jan. 2005, doi: 10.1016/J.TICS.2004.11.006.

[24] M. Mancas, “What Is Attention?,” in From Human Attention to Computational Attention, Springer, New York, NY, 2016, pp. 9–20, available at: Google Scholar.

[25] A. M. Treisman, “Strategies and models of selective attention,” Psychol. Rev., vol. 76, no. 3, pp. 282–299, 1969, doi: 10.1037/h0027242.

[26] E. Pessa, “Bottom-Up and Top-Down Mechanisms,” in Visual Attention Mechanisms, Boston, MA: Springer US, 2002, pp. 61–68, doi: 10.1007/978-1-4615-0111-4_6.

[27] C. E. Connor, H. E. Egeth, and S. Yantis, “Visual Attention: Bottom-Up Versus Top-Down,” Curr. Biol., vol. 14, no. 19, pp. R850–R852, Oct. 2004, doi: 10.1016/J.CUB.2004.09.041.

[28] M. I. Posner, “Orienting of attention,” Q. J. Exp. Psychol., vol. 32, no. 1, pp. 3–25, Feb. 1980, doi: 10.1080/00335558008248231.

[29] M. Carrasco, “Visual attention: The past 25 years,” Vision Res., vol. 51, no. 13, pp. 1484–1525, Jul. 2011, doi: 10.1016/J.VISRES.2011.04.012.

[30] J. E. Hoffman, “A two-stage model of visual search,” Percept. Psychophys., vol. 25, no. 4, pp. 319–327, Jul. 1979, doi: 10.3758/BF03198811.

[31] J. M. Wolfe, “Guided Search 2.0 A revised model of visual search,” Psychon. Bull. Rev., vol. 1, no. 2, pp. 202–238, Jun. 1994, doi: 10.3758/BF03200774.

[32] A. Borji and L. Itti, “State-of-the-Art in Visual Attention Modeling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 185–207, Jan. 2013, doi: 10.1109/TPAMI.2012.89.

[33] A. Borji, M.-M. Cheng, H. Jiang, and J. Li, “Salient Object Detection: A Benchmark,” IEEE Trans. Image Process., vol. 24, no. 12, pp. 5706–5722, Dec. 2015, doi: 10.1109/TIP.2015.2487833.

[34] A. Borji, M.-M. Cheng, Q. Hou, H. Jiang, and J. Li, “Salient object detection: A survey,” Comput. Vis. Media, vol. 5, no. 2, pp. 117–150, Jun. 2019, doi: 10.1007/s41095-019-0149-9.

[35] N. Riche and M. Mancas, “Bottom-Up Saliency Models for Still Images: A Practical Review,” in From Human Attention to Computational Attention, Springer, New York, NY, 2016, pp. 141–175, doi: 10.1007/978-1-4939-3435-5_9.

[36] T. Judd, K. Ehinger, F. Durand, and A. Torralba, “Learning to predict where humans look,” in 2009 IEEE 12th International Conference on Computer Vision, Sep. 2009, pp. 2106–2113, doi: 10.1109/ICCV.2009.5459462.

[37] N. D. B. Bruce and J. K. Tsotsos, “Saliency, attention, and visual search: An information theoretic approach,” J. Vis., vol. 9, no. 3, pp. 5–5, Mar. 2009, doi: 10.1167/9.3.5.

[38] G. Kootstra, A. Nederveen, and B. De Boer, “Paying attention to symmetry,” in British Machine Vision Conference (BMVC2008), 2008, pp. 1115–1125, available at: Google Scholar.

[39] Tie Liu et al., “Learning to Detect a Salient Object,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 2, pp. 353–367, Feb. 2011, doi: 10.1109/TPAMI.2010.70.

[40] C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang, “Saliency Detection via Graph-Based Manifold Ranking,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2013, pp. 3166–3173, doi: 10.1109/CVPR.2013.407.

[41] J. Li, M. D. Levine, X. An, X. Xu, and H. He, “Visual Saliency Based on Scale-Space Analysis in the Frequency Domain,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 4, pp. 996–1010, Apr. 2013, doi: 10.1109/TPAMI.2012.147.

[42] Q. Yan, L. Xu, J. Shi, and J. Jia, “Hierarchical Saliency Detection,” in 2013 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2013, pp. 1155–1162, doi: 10.1109/CVPR.2013.153.

[43] N. Bruce and J. Tsotsos, “Saliency based on information maximization,” in Advances in neural information processing systems, 2006, pp. 155–162, available at: Google Scholar.

[44] A. Torralba, A. Oliva, M. S. Castelhano, and J. M. Henderson, “Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search.,” Psychol. Rev., vol. 113, no. 4, pp. 766–786, Oct. 2006, doi: 10.1037/0033-295X.113.4.766.

[45] J. Harel, C. Koch, and P. Perona, “Graph-based visual saliency,” in Advances in neural information processing systems, 2007, pp. 545–552, available at: Google Scholar.

[46] X. Hou and L. Zhang, “Saliency Detection: A Spectral Residual Approach,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2007, pp. 1–8, doi: 10.1109/CVPR.2007.383267.

[47] D. Gao, V. Mahadevan, and N. Vasconcelos, “On the plausibility of the discriminant center-surround hypothesis for visual saliency,” J. Vis., vol. 8, no. 7, p. 13, Jun. 2008, doi: 10.1167/8.7.13.

[48] L. Zhang, M. H. Tong, T. K. Marks, H. Shan, and G. W. Cottrell, “SUN: A Bayesian framework for saliency using natural statistics,” J. Vis., vol. 8, no. 7, p. 32, Dec. 2008, doi: 10.1167/8.7.32.

[49] X. Hou and L. Zhang, “Dynamic visual attention: searching for coding length increments,” in Neural Information Processing Systems 2008 (NIPS 2008), 2009, pp. 681–688, Accessed: Jul. 29, 2019. [Online]. Available: Google Scholar.

[50] W. Kienzle, M. O. Franz, B. Scholkopf, and F. A. Wichmann, “Center-surround patterns emerge as optimal predictors for human saccade targets,” J. Vis., vol. 9, no. 5, pp. 7–7, May 2009, doi: 10.1167/9.5.7.

[51] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2009, pp. 1597–1604, doi: 10.1109/CVPR.2009.5206596.

[52] X. Li, H. Lu, L. Zhang, X. Ruan, and M.-H. Yang, “Saliency Detection via Dense and Sparse Reconstruction,” in 2013 IEEE International Conference on Computer Vision, Dec. 2013, pp. 2976–2983, doi: 10.1109/ICCV.2013.370.

[53] W. Zhu, S. Liang, Y. Wei, and J. Sun, “Saliency Optimization from Robust Background Detection,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2014, pp. 2814–2821, doi: 10.1109/CVPR.2014.360.

[54] Zhi Liu, Wenbin Zou, and O. Le Meur, “Saliency Tree: A Novel Saliency Detection Framework,” IEEE Trans. Image Process., vol. 23, no. 5, pp. 1937–1952, May 2014, doi: 10.1109/TIP.2014.2307434.

[55] C. Aytekin, S. Kiranyaz, and M. Gabbouj, “Automatic Object Segmentation by Quantum Cuts,” in 2014 22nd International Conference on Pattern Recognition, Aug. 2014, pp. 112–117, doi: 10.1109/ICPR.2014.29.

[56] Yao Qin, Huchuan Lu, Yiqun Xu, and He Wang, “Saliency detection via Cellular Automata,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp. 110–119, doi: 10.1109/CVPR.2015.7298606.

[57] V. Mnih, N. Heess, A. Graves, and others, “Recurrent models of visual attention,” in Advances in neural information processing systems, 2014, pp. 2204–2212, available at: Google Scholar.

[58] S. He, R. W. H. Lau, W. Liu, Z. Huang, and Q. Yang, “SuperCNN: A Superpixelwise Convolutional Neural Network for Salient Object Detection,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 330–344, Dec. 2015, doi: 10.1007/s11263-015-0822-0.

[59] Guanbin Li and Y. Yu, “Visual saliency based on multiscale deep features,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2015, pp. 5455–5463, doi: 10.1109/CVPR.2015.7299184.

[60] J. Pan et al., “SalGAN: Visual Saliency Prediction with Generative Adversarial Networks,” Jan. 2017, Accessed: Jul. 26, 2019. [Online]. Available: http://arxiv.org/abs/1701.01081.

[61] Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, and P. Torr, “Deeply Supervised Salient Object Detection with Short Connections,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 5300–5309, doi: 10.1109/CVPR.2017.563.

[62] M. Li, S. S. Ge, and T. H. Lee, “Glance and Glimpse Network: A Stochastic Attention Model Driven by Class Saliency,” Springer, Cham, 2017, pp. 572–587, doi: 10.1007/978-3-319-54526-4_42.

[63] A. Ablavatski, S. Lu, and J. Cai, “Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition,” in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Mar. 2017, pp. 971–978, doi: 10.1109/WACV.2017.113.

[64] W. Wang and J. Shen, “Deep Visual Attention Prediction,” IEEE Trans. Image Process., vol. 27, no. 5, pp. 2368–2378, May 2018, doi: 10.1109/TIP.2017.2787612.

[65] S. F. Dodge and L. J. Karam, “Visual Saliency Prediction Using a Mixture of Deep Neural Networks,” IEEE Trans. Image Process., vol. 27, no. 8, pp. 4080–4090, Aug. 2018, doi: 10.1109/TIP.2018.2834826.

[66] A. G. Leventhal, The Neural basis of visual function. CRC Press, 1991, available at: Google Scholar.

[67] E. L. Kaufman, M. W. Lord, T. W. Reese, and J. Volkmann, “The Discrimination of Visual Number,” Am. J. Psychol., vol. 62, no. 4, p. 498, Oct. 1949, doi: 10.2307/1418556.

[68] R. Cong, J. Lei, H. Fu, M.-M. Cheng, W. Lin, and Q. Huang, “Review of Visual Saliency Detection with Comprehensive Information,” IEEE Trans. Circuits Syst. Video Technol., pp. 1–1, 2018, doi: 10.1109/TCSVT.2018.2870832.

[69] M. Kummerer, T. S. A. Wallis, L. A. Gatys, and M. Bethge, “Understanding Low- and High-Level Contributions to Fixation Prediction,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct. 2017, pp. 4799–4808, doi: 10.1109/ICCV.2017.513.

[70] J. M. Wolfe and T. S. Horowitz, “What attributes guide the deployment of visual attention and how do they do it?,” Nat. Rev. Neurosci., vol. 5, no. 6, pp. 495–501, Jun. 2004, doi: 10.1038/nrn1411.

[71] D. E. Berlyne, “A Theory of Human Curiosity,” Br. J. Psychol. Gen. Sect., vol. 45, no. 3, pp. 180–191, Aug. 1954, doi: 10.1111/j.2044-8295.1954.tb01243.x.

[72] J. Gottlieb, P.-Y. Oudeyer, M. Lopes, and A. Baranes, “Information-seeking, curiosity, and attention: computational and neural mechanisms,” Trends Cogn. Sci., vol. 17, no. 11, pp. 585–593, Nov. 2013, doi: 10.1016/J.TICS.2013.09.001.

[73] Q. Wu and C. Miao, “Curiosity,” ACM Comput. Surv., vol. 46, no. 2, pp. 1–26, Nov. 2013, doi: 10.1145/2543581.2543585.

[74] J. Gottlieb, M. Lopes, and P.-Y. Oudeyer, “Motivated Cognition: Neural and Computational Mechanisms of Curiosity, Attention, and Intrinsic Motivation,” Emerald Group Publishing Limited, 2016, pp. 149–172, available at: Google Scholar.

[75] L. Itti and P. Baldi, “Bayesian surprise attracts human attention,” Vision Res., vol. 49, no. 10, pp. 1295–1306, Jun. 2009, doi: 10.1016/J.VISRES.2008.09.007.

[76] A. White, J. Modayil, and R. S. Sutton, “Surprise and Curiosity for Big Data Robotics,” Work. Twenty-Eighth AAAI Conf. Artif. Intell., Jun. 2014, Accessed: Aug. 14, 2019. [Online]. Available: Google Scholar.

[77] K. E. Twomey and G. Westermann, “Curiosity-based learning in infants: a neurocomputational approach,” Dev. Sci., vol. 21, no. 4, p. e12629, Jul. 2018, doi: 10.1111/desc.12629.

[78] C. Xia, F. Qi, and G. Shi, “Bottom–Up Visual Saliency Estimation With Deep Autoencoder-Based Sparse Reconstruction,” IEEE Trans. Neural Networks Learn. Syst., vol. 27, no. 6, pp. 1227–1240, Jun. 2016, doi: 10.1109/TNNLS.2015.2512898.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by Informatics Department - Universitas Ahmad Dahlan ,  UTM Big Data Centre - Universiti Teknologi Malaysia, and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W : http://ijain.org
E : info@ijain.org, andri.pranolo@tif.uad.ac.id (paper handling issues)
     ijain@uad.ac.id, andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0