Scene text detection and recognition: a survey

Scene text detection and recognition have been given a lot of attention in recent years and have been used in many vision-based applications. In this field, there are various types of challenges, including images with wavy text, images with text rotation and orientation, changing the scale and variety of text fonts, noisy images, wild background images, which make the detection and recognition of text from the image more complex and difficult. In this article, we first presented a comprehensive review of recent advances in text detection and recognition and described the advantages and disadvantages. The common datasets were introduced. Then, the recent methods compared together and analyzed the text detection and recognition systems. According to the recent decade studies, one of the most important challenges is curved and vertical text detection in this field. We have expressed approaches for the development of the detection and recognition system. Also, we have described the methods that are robust in the detection and recognition of curved and vertical texts. Finally, we have presented some approaches to develop text detection and recognition systems as the future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic €32.70 /Month

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Rent this article via DeepDyve

Similar content being viewed by others

Text Detection in Natural Scene Image: A Survey

Chapter © 2017

Review on Text Recognition in Natural Scene Images

Chapter © 2018

Text Region Identification from Natural Scene Images Using Semi-Supervised MSER Method

Chapter © 2022

Explore related subjects

References

  1. Ali S et al (2015) A review on text detection techniques. VFAST Trans Soft Eng 3(1):67–76 Google Scholar
  2. Almazán J et al (2014) Word spotting and recognition with embedded attributes. IEEE Trans Pattern Anal Mach Intell 36(12):2552–2566 ArticleGoogle Scholar
  3. Alsharif, Ouais, and Joelle Pineau (2013) End-to-end text recognition with hybrid HMM maxout models. arXiv preprint arXiv:1310.1811
  4. Ayed AB, Halima MB, Alimi AM (2015) MapReduce based text detection in big data natural scene videos. Procedia Comput Sci 53:216–223 ArticleGoogle Scholar
  5. Baek, Youngmin, et al. (2019) Character region awareness for text detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
  6. Bai X, Shi B, Zhang C, Cai X, Qi L (2017) Text/non-text image classification in the wild with convolutional neural networks. Pattern Recogn 66:437–446 ArticleGoogle Scholar
  7. Bai, Fan, et al. (2018) Edit probability for scene text recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
  8. Baran, Remigiusz, Pavol Partila, and Rafal Wilk (2018) Automated text detection and character recognition in natural scenes based on local image features and contour processing techniques. International Conference on Intelligent Human Systems Integration. Springer, Cham
  9. Bissacco, Alessandro, et al. (2013) Photoocr: Reading text in uncontrolled conditions. Proceedings of the ieee international conference on computer vision
  10. Campos D, Emídio T, Babu BR, Varma M (2009) Character recognition in natural images. VISAPP 2:7 Google Scholar
  11. Chen, Yuxin, and Yunxue Shao (2019) "Scene Text Recognition Based on Deep Learning: A Brief Survey. 2019 IEEE 11th International Conference on Communication Software and Networks (ICCSN). IEEE
  12. Chen J, Zhao H, Yang J, Zhang J, Li T, Wang K (2017) An intelligent character recognition method to filter spam images on cloud. Soft Comput 21(3):753–763 ArticleGoogle Scholar
  13. Cheng, Zhanzhan, et al. (2017) Focusing attention: Towards accurate text recognition in natural images. Proceedings of the IEEE international conference on computer vision
  14. Cheng G, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415 ArticleGoogle Scholar
  15. Cheng, Zhanzhan, et al. (2018) Aon: Towards arbitrarily-oriented text recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
  16. Ch'ng, Chee Kheng, and Chee Seng Chan (2017) Total-text: A comprehensive dataset for scene text detection and recognition. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE
  17. Cho H, Sung M, Jun B (2016) Canny text detector: fast and robust scene text localization algorithm. Proc IEEE Conf Comput Vis Pattern Recognit
  18. Coates, Adam, et al. (2011) Text detection and character recognition in scene images with unsupervised feature learning. 2011 International Conference on Document Analysis and Recognition. IEEE
  19. Dai, Yuchen, et al. (2018) Fused text segmentation networks for multi-oriented scene text detection." 2018 24th International Conference on Pattern Recognition (ICPR). IEEE
  20. Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 15(12):3736–3745 ArticleMathSciNetGoogle Scholar
  21. Epshtein, Boris, Eyal Ofek, and Yonatan Wexler (2010) Detecting text in natural scenes with stroke width transform. 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE
  22. Feng, Wei, et al. (2019) TextDragon: An end-to-end framework for arbitrary shaped text spotting. Proceedings of the IEEE/CVF International Conference on Computer Vision
  23. Goel, Vibhor, et al. (2013) Whole is greater than sum of parts: Recognizing scene text words." 2013 12th International Conference on Document Analysis and Recognition. IEEE
  24. Gupta N, Jalal AS (2019) A robust model for salient text detection in natural scene images using MSER feature detector and Grabcut. Multimed Tools Appl 78(8):10821–10835 ArticleGoogle Scholar
  25. Han, Junwei, et al. (2019) P-CNN: Part-based convolutional neural networks for fine-grained visual categorization. IEEE transactions on pattern analysis and machine intelligence
  26. He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541 ArticleMathSciNetMATHGoogle Scholar
  27. He W et al (2020) Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recognition 98:107026 ArticleGoogle Scholar
  28. Huang, Weilin, et al. (2013) Text localization in natural images using stroke feature transform and text covariance descriptors." Proceedings of the IEEE international conference on computer vision
  29. Huang, Weilin, Yu Qiao, and Xiaoou Tang (2014) Robust scene text detection with convolution neural network induced mser trees. European conference on computer vision. Springer, Cham
  30. Islam, Md Rabiul, et al. (2016) Text detection and recognition using enhanced MSER detection and a novel OCR technique. 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV). IEEE
  31. Jaderberg, Max, et al. (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227
  32. Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. European conference on computer vision, Springer, Cham BookGoogle Scholar
  33. Jaderberg, Max, et al. (2014) Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903
  34. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20 ArticleMathSciNetGoogle Scholar
  35. Jain AK, Bin Y (1998) Automatic text location in images and video frames. Pattern Recogn 31(12):2055–2076 ArticleGoogle Scholar
  36. Jeong, Munho, and Kang-Hyun Jo (2015) "Multi language text detection using fast stroke width transform." 2015 21st Korea-Japan joint workshop on Frontiers of computer vision (FCV). IEEE
  37. Jiang, Yingying, et al. (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579
  38. Karatzas, Dimosthenis, et al. (2013) ICDAR 2013 robust reading competition. 2013 12th International Conference on Document Analysis and Recognition. IEEE
  39. Karatzas, Dimosthenis, et al. (2015) ICDAR 2015 competition on robust reading. 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE
  40. Koo HI, Kim DH (2013) Scene text detection via connected component clustering and nontext filtering. IEEE Trans Image Process 22(6):2296–2305 ArticleMathSciNetMATHGoogle Scholar
  41. Kumar S (2016) Krishan Kumar, and Rahul Kumar Mishra. "scene text recognition using artificial neural network: a survey.". Int J Comput Appl 137(6):40–50 Google Scholar
  42. Lee C-Y, Osindero S (2016) Recursive recurrent nets with attention modeling for ocr in the wild. Proc IEEE Conf Comput Vis Pattern Recognit
  43. Liao, Minghui, et al. (2019) Scene text recognition from two-dimensional perspective. Proceedings of the AAAI Conference on Artificial Intelligence. 33:01
  44. Liao, Minghui, et al. (2017) Textboxes: A fast text detector with a single deep neural network. Proceedings of the AAAI conference on artificial intelligence. 31:1
  45. Liao, Minghui, et al. (2018) Rotation-sensitive regression for oriented scene text detection. Proceedings of the IEEE conference on computer vision and pattern recognition
  46. Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690 ArticleMathSciNetMATHGoogle Scholar
  47. Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Doc Anal Recognit 22(2):143–162 ArticleGoogle Scholar
  48. Liu F, Chen C, Gu D, Zheng J (2019) FTPN: scene text detection with feature pyramid based text proposal network. IEEE Access 7:44219–44228 ArticleGoogle Scholar
  49. Long, Shangbang, et al. (2020) A new perspective for flexible feature gathering in scene text recognition via character anchor pooling. ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
  50. Long S, He X, Yao C (2020) Scene text detection and recognition: the deep learning era. Int J Comput Vis 129:1–24 Google Scholar
  51. Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R, Ashida K, Nagai H, Okamoto M, Yamamoto H, Miyao H, Zhu JM, Ou WW, Wolf C, Jolion J-M, Todoran L, Worring M, Lin X (2005) ICDAR 2003 robust reading competitions: entries, results, and future directions. IJDAR 7(2–3):105–122 ArticleGoogle Scholar
  52. Luo C, Jin L, Sun Z (2019) Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn 90:109–118 ArticleGoogle Scholar
  53. Lyu, Pengyuan, et al. (2018) Multi-oriented scene text detection via corner localization and region segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition
  54. Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimedia 20(11):3111–3122 ArticleGoogle Scholar
  55. Mishra, Anand, Karteek Alahari, and Jawahar CV (2012) Scene text recognition using higher order language priors. BMVC-British Machine Vision Conference. BMVA
  56. Mishra, Anand, Karteek Alahari, and Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition." 2012 IEEE conference on computer vision and pattern recognition. IEEE
  57. Naiemi F, Ghods V, Khalesi H (2019) An efficient character recognition method using enhanced HOG for spam image detection. Soft Comput 23(22):11759–11774 ArticleGoogle Scholar
  58. Naiemi F, Ghods V, Khalesi H (2020) Scene text detection using enhanced extremal region and convolutional neural network. Multimed Tools Appl 79(37):27137–27159 ArticleGoogle Scholar
  59. Naiemi, Fatemeh, Vahid Ghods, and Hassan Khalesi (2021) MOSTL: an accurate multi oriented scene text localization. Circuits, Systems, and Signal Processing, in press
  60. Naiemi F, Ghods V, Khalesi H (2021) A novel pipeline framework for multi oriented scene text image detection and recognition. Expert Syst Appl 170:114549 ArticleGoogle Scholar
  61. Nayef, Nibal, et al. (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt." 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE
  62. Nayef, Nibal, et al. (2019) ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE
  63. Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. Asian conference on computer vision, Springer, Berlin, Heidelberg Google Scholar
  64. Neumann, Lukáš, and Jiří Matas (2012) Real-time scene text localization and recognition. 2012 IEEE conference on computer vision and pattern recognition. IEEE
  65. Neumann L, Matas J (2015) Real-time lexicon-free scene text localization and recognition. IEEE Trans Pattern Anal Mach Intell 38(9):1872–1885 ArticleGoogle Scholar
  66. Neycharan JG, Ahmadyfard A (2018) Edge color transform: a new operator for natural scene text localization. Multimed Tools Appl 77(6):7615–7636 ArticleGoogle Scholar
  67. Novikova, Tatiana, et al. (2012) Large-lexicon attribute-consistent text recognition in natural images." European conference on computer vision. Springer, Berlin, Heidelberg
  68. Pan Y-F, Hou X, Liu C-L (2010) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813 MathSciNetMATHGoogle Scholar
  69. Qiao, Liang, et al. (2020) Text perceptron: Towards end-to-end arbitrary-shaped text spotting. Proceedings of the AAAI Conference on Artificial Intelligence. 34:07
  70. Qin, Siyang, et al. (2019) Towards unconstrained end-to-end text spotting. Proceedings of the IEEE/CVF International Conference on Computer Vision
  71. Ranjbarzadeh R, Saadi SB (2020) Automated liver and tumor segmentation based on concave and convex points using fuzzy c-means and mean shift clustering. Measurement 150:107086 ArticleGoogle Scholar
  72. Ren X, Zhou Y, Huang Z, Sun J, Yang X, Chen K (2017) A novel text structure feature extractor for Chinese scene text detection and recognition. IEEE Access 5:3193–3204 ArticleGoogle Scholar
  73. Rodriguez-Serrano JA, Gordo A, Perronnin F (2015) Label embedding: a frugal baseline for text recognition. Int J Comput Vis 113(3):193–207 ArticleGoogle Scholar
  74. Shahab, Asif, Faisal Shafait, and Andreas Dengel (2011) ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. 2011 international conference on document analysis and recognition. IEEE
  75. Shi, Baoguang, et al. (2016) Robust scene text recognition with automatic rectification. Proceedings of the IEEE conference on computer vision and pattern recognition
  76. Shi, Cunzhao, et al. (2013) Scene text recognition using part-based tree-structured character detection." Proceedings of the IEEE conference on computer vision and pattern recognition
  77. Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304 ArticleGoogle Scholar
  78. Shivakumara P, Phan TQ, Tan CL (2010) A laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419 ArticleGoogle Scholar
  79. Shivakumara P, Phan TQ, Lu S, Tan CL (2013) Gradient vector flow and grouping-based method for arbitrarily oriented scene text detection in video images. IEEE Trans Circuits Syst Video Technol 23(10):1729–1739 ArticleGoogle Scholar
  80. Su, Bolan, and Shijian Lu. (2014) Accurate scene text recognition based on recurrent neural network." Asian Conference on Computer Vision. Springer, Cham
  81. Sung, Myung-Chul, et al. (2015) Scene text detection with robust character candidate extraction method." 2015 13th International conference on document analysis and recognition (ICDAR). IEEE
  82. Tabassum, Adiba, and Shweta A. Dhondse (2015) Text detection using MSER and stroke width transform." 2015 Fifth International Conference on Communication Systems and Network Technologies. IEEE
  83. Tian, Zhi, et al. (2016) Detecting text in natural image with connectionist text proposal network. European conference on computer vision. Springer, Cham
  84. Vasilopoulos N, Kavallieratou E (2017) Unified layout analysis and text localization framework. J Electron Imaging 26(1):013009 ArticleGoogle Scholar
  85. Veit, Andreas, et al. (2016) Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140
  86. Wang K, Belongie S (2010) Word spotting in the wild. European conference on computer vision, Springer, Berlin, Heidelberg BookGoogle Scholar
  87. Wang, Jianfeng, and Xiaolin Hu. (2017) Gated recurrent convolution neural network for ocr. Proceedings of the 31st International Conference on Neural Information Processing Systems
  88. Wang, Kai, Boris Babenko, and Serge Belongie (2011) "End-to-end scene text recognition." 2011 International Conference on Computer Vision. IEEE
  89. Wang, Kai, Boris Babenko, and Serge Belongie (2011) End-to-end scene text recognition. 2011 International Conference on Computer Vision. IEEE
  90. Wang, Tao, et al. (2012) End-to-end text recognition with convolutional neural networks. Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE
  91. Wang R, Sang N, Gao C (2015) Text detection approach based on confidence map and context information. Neurocomputing 157:153–165 ArticleGoogle Scholar
  92. Wang, Wenhai, et al. (2019) Shape robust text detection with progressive scale expansion network. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
  93. Wang Q, Huang Y, Jia W, He X, Blumenstein M, Lyu S, Lu Y (2020) FACLSTM: ConvLSTM with focused attention for scene text recognition. Science China Inf Sci 63(2):1–14 MathSciNetGoogle Scholar
  94. Wright J, Yang AY, Ganesh A, Sastry SS, Yi Ma (2008) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227 ArticleGoogle Scholar
  95. Yang, Xiao, et al. (2017) Learning to Read Irregular Text with Attention Mechanisms. IJCAI. 1:2
  96. Yang, Qiangpeng, et al. (2018) Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. arXiv preprint arXiv:1805.01167
  97. Yao, Cong, et al. (2012) Detecting texts of arbitrary orientations in natural images." 2012 IEEE conference on computer vision and pattern recognition. IEEE
  98. Yao C, Bai X, Liu W (2014) A unified framework for multioriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4749 ArticleMathSciNetMATHGoogle Scholar
  99. Yao, Cong, et al. (2014) Strokelets: A learned multi-scale representation for scene text recognition. Proceedings of the IEEE conference on computer vision and pattern recognition
  100. Yao, Cong, et al. (2016) Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002
  101. Ye Q, Doermann D (2014) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500 ArticleGoogle Scholar
  102. Ye Q, Huang Q, Gao W, Zhao D (2005) Fast and robust text detection in images and video frames. Image Vis Comput 23(6):565–576 ArticleGoogle Scholar
  103. Yin X-C et al (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983 Google Scholar
  104. Yuan J, Wei B, Liu Y, Zhang Y, Wang L (2015) A method for text line detection in natural images. Multimed Tools Appl 74(3):859–884 ArticleGoogle Scholar
  105. Zhan F, Shijian L (2019) Esir: end-to-end scene text recognition via iterative image rectification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
  106. Zhang, Yaping, et al. (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
  107. Zhang H, Zhao K, Song Y-Z, Guo J (2013) Text extraction from natural scene image: a survey. Neurocomputing 122:310–323 ArticleGoogle Scholar
  108. Zhang, Zheng, et al. (2015) Symmetry-based text line detection in natural scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
  109. Zhang, Zheng, et al. (2016) Multi-oriented text detection with fully convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition
  110. Zhang D, Meng D, Han J (2016) Co-saliency detection via a self-paced multiple-instance learning framework. IEEE Trans Pattern Anal Mach Intell 39(5):865–878 ArticleGoogle Scholar
  111. Zhang, Chengquan, et al. (2019) Look more than once: An accurate detector for text of arbitrary shapes. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
  112. Zheng Y, Iwana BK, Uchida S (2019) Mining the displacement of max-pooling for text recognition. Pattern Recogn 93:558–569 ArticleGoogle Scholar
  113. Zhong Z, Sun L, Huo Q (2019) An anchor-free region proposal network for faster R-CNN-based text detection approaches. Int J Doc Anal Recognit 22(3):315–327 ArticleGoogle Scholar
  114. Zhou, Xinyu, et al. (2017) East: an efficient and accurate scene text detector. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition
  115. Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: recent advances and future trends. Front Comput Sci 10(1):19–36 ArticleGoogle Scholar
  116. Zhu W et al (2017) Scene text detection via extremal region based double threshold convolutional network classification. PloS one 12.8:e0182227 ArticleGoogle Scholar
  117. Zhu, Zhen, et al. (2018) Feature Fusion for Scene Text Detection. 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE

Author information

Authors and Affiliations

  1. Department of Electrical and Computer Engineering, Semnan Branch, Islamic Azad University, Semnan, Iran Fatemeh Naiemi & Vahid Ghods
  2. Department of Electrical and Computer Engineering, Garmsar Branch, Islamic Azad University, Garmsar, Iran Hassan Khalesi
  1. Fatemeh Naiemi