Scene text detection and recognition: a survey

Scene text detection and recognition have been given a lot of attention in recent years and have been used in many vision-based applications. In this field, there are various types of challenges, including images with wavy text, images with text rotation and orientation, changing the scale and variety of text fonts, noisy images, wild background images, which make the detection and recognition of text from the image more complex and difficult. In this article, we first presented a comprehensive review of recent advances in text detection and recognition and described the advantages and disadvantages. The common datasets were introduced. Then, the recent methods compared together and analyzed the text detection and recognition systems. According to the recent decade studies, one of the most important challenges is curved and vertical text detection in this field. We have expressed approaches for the development of the detection and recognition system. Also, we have described the methods that are robust in the detection and recognition of curved and vertical texts. Finally, we have presented some approaches to develop text detection and recognition systems as the future work.
This is a preview of subscription content, log in via an institution to check access.
Access this article
Subscribe and save
Springer+ Basic
€32.70 /Month
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (France)
Instant access to the full article PDF.
Rent this article via DeepDyve





















Similar content being viewed by others

Text Detection in Natural Scene Image: A Survey
Chapter © 2017

Review on Text Recognition in Natural Scene Images
Chapter © 2018

Text Region Identification from Natural Scene Images Using Semi-Supervised MSER Method
Chapter © 2022
Explore related subjects
References
- Ali S et al (2015) A review on text detection techniques. VFAST Trans Soft Eng 3(1):67–76 Google Scholar
- Almazán J et al (2014) Word spotting and recognition with embedded attributes. IEEE Trans Pattern Anal Mach Intell 36(12):2552–2566 ArticleGoogle Scholar
- Alsharif, Ouais, and Joelle Pineau (2013) End-to-end text recognition with hybrid HMM maxout models. arXiv preprint arXiv:1310.1811
- Ayed AB, Halima MB, Alimi AM (2015) MapReduce based text detection in big data natural scene videos. Procedia Comput Sci 53:216–223 ArticleGoogle Scholar
- Baek, Youngmin, et al. (2019) Character region awareness for text detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
- Bai X, Shi B, Zhang C, Cai X, Qi L (2017) Text/non-text image classification in the wild with convolutional neural networks. Pattern Recogn 66:437–446 ArticleGoogle Scholar
- Bai, Fan, et al. (2018) Edit probability for scene text recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
- Baran, Remigiusz, Pavol Partila, and Rafal Wilk (2018) Automated text detection and character recognition in natural scenes based on local image features and contour processing techniques. International Conference on Intelligent Human Systems Integration. Springer, Cham
- Bissacco, Alessandro, et al. (2013) Photoocr: Reading text in uncontrolled conditions. Proceedings of the ieee international conference on computer vision
- Campos D, Emídio T, Babu BR, Varma M (2009) Character recognition in natural images. VISAPP 2:7 Google Scholar
- Chen, Yuxin, and Yunxue Shao (2019) "Scene Text Recognition Based on Deep Learning: A Brief Survey. 2019 IEEE 11th International Conference on Communication Software and Networks (ICCSN). IEEE
- Chen J, Zhao H, Yang J, Zhang J, Li T, Wang K (2017) An intelligent character recognition method to filter spam images on cloud. Soft Comput 21(3):753–763 ArticleGoogle Scholar
- Cheng, Zhanzhan, et al. (2017) Focusing attention: Towards accurate text recognition in natural images. Proceedings of the IEEE international conference on computer vision
- Cheng G, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415 ArticleGoogle Scholar
- Cheng, Zhanzhan, et al. (2018) Aon: Towards arbitrarily-oriented text recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
- Ch'ng, Chee Kheng, and Chee Seng Chan (2017) Total-text: A comprehensive dataset for scene text detection and recognition. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE
- Cho H, Sung M, Jun B (2016) Canny text detector: fast and robust scene text localization algorithm. Proc IEEE Conf Comput Vis Pattern Recognit
- Coates, Adam, et al. (2011) Text detection and character recognition in scene images with unsupervised feature learning. 2011 International Conference on Document Analysis and Recognition. IEEE
- Dai, Yuchen, et al. (2018) Fused text segmentation networks for multi-oriented scene text detection." 2018 24th International Conference on Pattern Recognition (ICPR). IEEE
- Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 15(12):3736–3745 ArticleMathSciNetGoogle Scholar
- Epshtein, Boris, Eyal Ofek, and Yonatan Wexler (2010) Detecting text in natural scenes with stroke width transform. 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE
- Feng, Wei, et al. (2019) TextDragon: An end-to-end framework for arbitrary shaped text spotting. Proceedings of the IEEE/CVF International Conference on Computer Vision
- Goel, Vibhor, et al. (2013) Whole is greater than sum of parts: Recognizing scene text words." 2013 12th International Conference on Document Analysis and Recognition. IEEE
- Gupta N, Jalal AS (2019) A robust model for salient text detection in natural scene images using MSER feature detector and Grabcut. Multimed Tools Appl 78(8):10821–10835 ArticleGoogle Scholar
- Han, Junwei, et al. (2019) P-CNN: Part-based convolutional neural networks for fine-grained visual categorization. IEEE transactions on pattern analysis and machine intelligence
- He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541 ArticleMathSciNetMATHGoogle Scholar
- He W et al (2020) Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recognition 98:107026 ArticleGoogle Scholar
- Huang, Weilin, et al. (2013) Text localization in natural images using stroke feature transform and text covariance descriptors." Proceedings of the IEEE international conference on computer vision
- Huang, Weilin, Yu Qiao, and Xiaoou Tang (2014) Robust scene text detection with convolution neural network induced mser trees. European conference on computer vision. Springer, Cham
- Islam, Md Rabiul, et al. (2016) Text detection and recognition using enhanced MSER detection and a novel OCR technique. 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV). IEEE
- Jaderberg, Max, et al. (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227
- Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. European conference on computer vision, Springer, Cham BookGoogle Scholar
- Jaderberg, Max, et al. (2014) Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903
- Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20 ArticleMathSciNetGoogle Scholar
- Jain AK, Bin Y (1998) Automatic text location in images and video frames. Pattern Recogn 31(12):2055–2076 ArticleGoogle Scholar
- Jeong, Munho, and Kang-Hyun Jo (2015) "Multi language text detection using fast stroke width transform." 2015 21st Korea-Japan joint workshop on Frontiers of computer vision (FCV). IEEE
- Jiang, Yingying, et al. (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579
- Karatzas, Dimosthenis, et al. (2013) ICDAR 2013 robust reading competition. 2013 12th International Conference on Document Analysis and Recognition. IEEE
- Karatzas, Dimosthenis, et al. (2015) ICDAR 2015 competition on robust reading. 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE
- Koo HI, Kim DH (2013) Scene text detection via connected component clustering and nontext filtering. IEEE Trans Image Process 22(6):2296–2305 ArticleMathSciNetMATHGoogle Scholar
- Kumar S (2016) Krishan Kumar, and Rahul Kumar Mishra. "scene text recognition using artificial neural network: a survey.". Int J Comput Appl 137(6):40–50 Google Scholar
- Lee C-Y, Osindero S (2016) Recursive recurrent nets with attention modeling for ocr in the wild. Proc IEEE Conf Comput Vis Pattern Recognit
- Liao, Minghui, et al. (2019) Scene text recognition from two-dimensional perspective. Proceedings of the AAAI Conference on Artificial Intelligence. 33:01
- Liao, Minghui, et al. (2017) Textboxes: A fast text detector with a single deep neural network. Proceedings of the AAAI conference on artificial intelligence. 31:1
- Liao, Minghui, et al. (2018) Rotation-sensitive regression for oriented scene text detection. Proceedings of the IEEE conference on computer vision and pattern recognition
- Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690 ArticleMathSciNetMATHGoogle Scholar
- Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Doc Anal Recognit 22(2):143–162 ArticleGoogle Scholar
- Liu F, Chen C, Gu D, Zheng J (2019) FTPN: scene text detection with feature pyramid based text proposal network. IEEE Access 7:44219–44228 ArticleGoogle Scholar
- Long, Shangbang, et al. (2020) A new perspective for flexible feature gathering in scene text recognition via character anchor pooling. ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
- Long S, He X, Yao C (2020) Scene text detection and recognition: the deep learning era. Int J Comput Vis 129:1–24 Google Scholar
- Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R, Ashida K, Nagai H, Okamoto M, Yamamoto H, Miyao H, Zhu JM, Ou WW, Wolf C, Jolion J-M, Todoran L, Worring M, Lin X (2005) ICDAR 2003 robust reading competitions: entries, results, and future directions. IJDAR 7(2–3):105–122 ArticleGoogle Scholar
- Luo C, Jin L, Sun Z (2019) Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn 90:109–118 ArticleGoogle Scholar
- Lyu, Pengyuan, et al. (2018) Multi-oriented scene text detection via corner localization and region segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition
- Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimedia 20(11):3111–3122 ArticleGoogle Scholar
- Mishra, Anand, Karteek Alahari, and Jawahar CV (2012) Scene text recognition using higher order language priors. BMVC-British Machine Vision Conference. BMVA
- Mishra, Anand, Karteek Alahari, and Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition." 2012 IEEE conference on computer vision and pattern recognition. IEEE
- Naiemi F, Ghods V, Khalesi H (2019) An efficient character recognition method using enhanced HOG for spam image detection. Soft Comput 23(22):11759–11774 ArticleGoogle Scholar
- Naiemi F, Ghods V, Khalesi H (2020) Scene text detection using enhanced extremal region and convolutional neural network. Multimed Tools Appl 79(37):27137–27159 ArticleGoogle Scholar
- Naiemi, Fatemeh, Vahid Ghods, and Hassan Khalesi (2021) MOSTL: an accurate multi oriented scene text localization. Circuits, Systems, and Signal Processing, in press
- Naiemi F, Ghods V, Khalesi H (2021) A novel pipeline framework for multi oriented scene text image detection and recognition. Expert Syst Appl 170:114549 ArticleGoogle Scholar
- Nayef, Nibal, et al. (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt." 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE
- Nayef, Nibal, et al. (2019) ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE
- Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. Asian conference on computer vision, Springer, Berlin, Heidelberg Google Scholar
- Neumann, Lukáš, and Jiří Matas (2012) Real-time scene text localization and recognition. 2012 IEEE conference on computer vision and pattern recognition. IEEE
- Neumann L, Matas J (2015) Real-time lexicon-free scene text localization and recognition. IEEE Trans Pattern Anal Mach Intell 38(9):1872–1885 ArticleGoogle Scholar
- Neycharan JG, Ahmadyfard A (2018) Edge color transform: a new operator for natural scene text localization. Multimed Tools Appl 77(6):7615–7636 ArticleGoogle Scholar
- Novikova, Tatiana, et al. (2012) Large-lexicon attribute-consistent text recognition in natural images." European conference on computer vision. Springer, Berlin, Heidelberg
- Pan Y-F, Hou X, Liu C-L (2010) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813 MathSciNetMATHGoogle Scholar
- Qiao, Liang, et al. (2020) Text perceptron: Towards end-to-end arbitrary-shaped text spotting. Proceedings of the AAAI Conference on Artificial Intelligence. 34:07
- Qin, Siyang, et al. (2019) Towards unconstrained end-to-end text spotting. Proceedings of the IEEE/CVF International Conference on Computer Vision
- Ranjbarzadeh R, Saadi SB (2020) Automated liver and tumor segmentation based on concave and convex points using fuzzy c-means and mean shift clustering. Measurement 150:107086 ArticleGoogle Scholar
- Ren X, Zhou Y, Huang Z, Sun J, Yang X, Chen K (2017) A novel text structure feature extractor for Chinese scene text detection and recognition. IEEE Access 5:3193–3204 ArticleGoogle Scholar
- Rodriguez-Serrano JA, Gordo A, Perronnin F (2015) Label embedding: a frugal baseline for text recognition. Int J Comput Vis 113(3):193–207 ArticleGoogle Scholar
- Shahab, Asif, Faisal Shafait, and Andreas Dengel (2011) ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. 2011 international conference on document analysis and recognition. IEEE
- Shi, Baoguang, et al. (2016) Robust scene text recognition with automatic rectification. Proceedings of the IEEE conference on computer vision and pattern recognition
- Shi, Cunzhao, et al. (2013) Scene text recognition using part-based tree-structured character detection." Proceedings of the IEEE conference on computer vision and pattern recognition
- Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304 ArticleGoogle Scholar
- Shivakumara P, Phan TQ, Tan CL (2010) A laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419 ArticleGoogle Scholar
- Shivakumara P, Phan TQ, Lu S, Tan CL (2013) Gradient vector flow and grouping-based method for arbitrarily oriented scene text detection in video images. IEEE Trans Circuits Syst Video Technol 23(10):1729–1739 ArticleGoogle Scholar
- Su, Bolan, and Shijian Lu. (2014) Accurate scene text recognition based on recurrent neural network." Asian Conference on Computer Vision. Springer, Cham
- Sung, Myung-Chul, et al. (2015) Scene text detection with robust character candidate extraction method." 2015 13th International conference on document analysis and recognition (ICDAR). IEEE
- Tabassum, Adiba, and Shweta A. Dhondse (2015) Text detection using MSER and stroke width transform." 2015 Fifth International Conference on Communication Systems and Network Technologies. IEEE
- Tian, Zhi, et al. (2016) Detecting text in natural image with connectionist text proposal network. European conference on computer vision. Springer, Cham
- Vasilopoulos N, Kavallieratou E (2017) Unified layout analysis and text localization framework. J Electron Imaging 26(1):013009 ArticleGoogle Scholar
- Veit, Andreas, et al. (2016) Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140
- Wang K, Belongie S (2010) Word spotting in the wild. European conference on computer vision, Springer, Berlin, Heidelberg BookGoogle Scholar
- Wang, Jianfeng, and Xiaolin Hu. (2017) Gated recurrent convolution neural network for ocr. Proceedings of the 31st International Conference on Neural Information Processing Systems
- Wang, Kai, Boris Babenko, and Serge Belongie (2011) "End-to-end scene text recognition." 2011 International Conference on Computer Vision. IEEE
- Wang, Kai, Boris Babenko, and Serge Belongie (2011) End-to-end scene text recognition. 2011 International Conference on Computer Vision. IEEE
- Wang, Tao, et al. (2012) End-to-end text recognition with convolutional neural networks. Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE
- Wang R, Sang N, Gao C (2015) Text detection approach based on confidence map and context information. Neurocomputing 157:153–165 ArticleGoogle Scholar
- Wang, Wenhai, et al. (2019) Shape robust text detection with progressive scale expansion network. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
- Wang Q, Huang Y, Jia W, He X, Blumenstein M, Lyu S, Lu Y (2020) FACLSTM: ConvLSTM with focused attention for scene text recognition. Science China Inf Sci 63(2):1–14 MathSciNetGoogle Scholar
- Wright J, Yang AY, Ganesh A, Sastry SS, Yi Ma (2008) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227 ArticleGoogle Scholar
- Yang, Xiao, et al. (2017) Learning to Read Irregular Text with Attention Mechanisms. IJCAI. 1:2
- Yang, Qiangpeng, et al. (2018) Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. arXiv preprint arXiv:1805.01167
- Yao, Cong, et al. (2012) Detecting texts of arbitrary orientations in natural images." 2012 IEEE conference on computer vision and pattern recognition. IEEE
- Yao C, Bai X, Liu W (2014) A unified framework for multioriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4749 ArticleMathSciNetMATHGoogle Scholar
- Yao, Cong, et al. (2014) Strokelets: A learned multi-scale representation for scene text recognition. Proceedings of the IEEE conference on computer vision and pattern recognition
- Yao, Cong, et al. (2016) Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002
- Ye Q, Doermann D (2014) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500 ArticleGoogle Scholar
- Ye Q, Huang Q, Gao W, Zhao D (2005) Fast and robust text detection in images and video frames. Image Vis Comput 23(6):565–576 ArticleGoogle Scholar
- Yin X-C et al (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983 Google Scholar
- Yuan J, Wei B, Liu Y, Zhang Y, Wang L (2015) A method for text line detection in natural images. Multimed Tools Appl 74(3):859–884 ArticleGoogle Scholar
- Zhan F, Shijian L (2019) Esir: end-to-end scene text recognition via iterative image rectification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
- Zhang, Yaping, et al. (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
- Zhang H, Zhao K, Song Y-Z, Guo J (2013) Text extraction from natural scene image: a survey. Neurocomputing 122:310–323 ArticleGoogle Scholar
- Zhang, Zheng, et al. (2015) Symmetry-based text line detection in natural scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
- Zhang, Zheng, et al. (2016) Multi-oriented text detection with fully convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition
- Zhang D, Meng D, Han J (2016) Co-saliency detection via a self-paced multiple-instance learning framework. IEEE Trans Pattern Anal Mach Intell 39(5):865–878 ArticleGoogle Scholar
- Zhang, Chengquan, et al. (2019) Look more than once: An accurate detector for text of arbitrary shapes. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
- Zheng Y, Iwana BK, Uchida S (2019) Mining the displacement of max-pooling for text recognition. Pattern Recogn 93:558–569 ArticleGoogle Scholar
- Zhong Z, Sun L, Huo Q (2019) An anchor-free region proposal network for faster R-CNN-based text detection approaches. Int J Doc Anal Recognit 22(3):315–327 ArticleGoogle Scholar
- Zhou, Xinyu, et al. (2017) East: an efficient and accurate scene text detector. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition
- Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: recent advances and future trends. Front Comput Sci 10(1):19–36 ArticleGoogle Scholar
- Zhu W et al (2017) Scene text detection via extremal region based double threshold convolutional network classification. PloS one 12.8:e0182227 ArticleGoogle Scholar
- Zhu, Zhen, et al. (2018) Feature Fusion for Scene Text Detection. 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE
Author information
Authors and Affiliations
- Department of Electrical and Computer Engineering, Semnan Branch, Islamic Azad University, Semnan, Iran Fatemeh Naiemi & Vahid Ghods
- Department of Electrical and Computer Engineering, Garmsar Branch, Islamic Azad University, Garmsar, Iran Hassan Khalesi
- Fatemeh Naiemi