Scene text detection and recognition: a survey

Scene text detection and recognition have been given a lot of attention in recent years and have been used in many vision-based applications. In this field, there are various types of challenges, including images with wavy text, images with text rotation and orientation, changing the scale and variety of text fonts, noisy images, wild background images, which make the detection and recognition of text from the image more complex and difficult. In this article, we first presented a comprehensive review of recent advances in text detection and recognition and described the advantages and disadvantages. The common datasets were introduced. Then, the recent methods compared together and analyzed the text detection and recognition systems. According to the recent decade studies, one of the most important challenges is curved and vertical text detection in this field. We have expressed approaches for the development of the detection and recognition system. Also, we have described the methods that are robust in the detection and recognition of curved and vertical texts. Finally, we have presented some approaches to develop text detection and recognition systems as the future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic €32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Rent this article via DeepDyve

Text Detection in Natural Scene Image: A Survey

Review on Text Recognition in Natural Scene Images

Text Region Identification from Natural Scene Images Using Semi-Supervised MSER Method

Explore related subjects

References

Ali S et al (2015) A review on text detection techniques. VFAST Trans Soft Eng 3(1):67–76 Google Scholar
Almazán J et al (2014) Word spotting and recognition with embedded attributes. IEEE Trans Pattern Anal Mach Intell 36(12):2552–2566 ArticleGoogle Scholar
Alsharif, Ouais, and Joelle Pineau (2013) End-to-end text recognition with hybrid HMM maxout models. arXiv preprint arXiv:1310.1811
Ayed AB, Halima MB, Alimi AM (2015) MapReduce based text detection in big data natural scene videos. Procedia Comput Sci 53:216–223 ArticleGoogle Scholar
Baek, Youngmin, et al. (2019) Character region awareness for text detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Bai X, Shi B, Zhang C, Cai X, Qi L (2017) Text/non-text image classification in the wild with convolutional neural networks. Pattern Recogn 66:437–446 ArticleGoogle Scholar
Bai, Fan, et al. (2018) Edit probability for scene text recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Baran, Remigiusz, Pavol Partila, and Rafal Wilk (2018) Automated text detection and character recognition in natural scenes based on local image features and contour processing techniques. International Conference on Intelligent Human Systems Integration. Springer, Cham
Bissacco, Alessandro, et al. (2013) Photoocr: Reading text in uncontrolled conditions. Proceedings of the ieee international conference on computer vision
Campos D, Emídio T, Babu BR, Varma M (2009) Character recognition in natural images. VISAPP 2:7 Google Scholar
Chen, Yuxin, and Yunxue Shao (2019) "Scene Text Recognition Based on Deep Learning: A Brief Survey. 2019 IEEE 11th International Conference on Communication Software and Networks (ICCSN). IEEE
Chen J, Zhao H, Yang J, Zhang J, Li T, Wang K (2017) An intelligent character recognition method to filter spam images on cloud. Soft Comput 21(3):753–763 ArticleGoogle Scholar
Cheng, Zhanzhan, et al. (2017) Focusing attention: Towards accurate text recognition in natural images. Proceedings of the IEEE international conference on computer vision
Cheng G, Zhou P, Han J (2016) Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans Geosci Remote Sens 54(12):7405–7415 ArticleGoogle Scholar
Cheng, Zhanzhan, et al. (2018) Aon: Towards arbitrarily-oriented text recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Ch'ng, Chee Kheng, and Chee Seng Chan (2017) Total-text: A comprehensive dataset for scene text detection and recognition. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE
Cho H, Sung M, Jun B (2016) Canny text detector: fast and robust scene text localization algorithm. Proc IEEE Conf Comput Vis Pattern Recognit
Coates, Adam, et al. (2011) Text detection and character recognition in scene images with unsupervised feature learning. 2011 International Conference on Document Analysis and Recognition. IEEE
Dai, Yuchen, et al. (2018) Fused text segmentation networks for multi-oriented scene text detection." 2018 24th International Conference on Pattern Recognition (ICPR). IEEE
Elad M, Aharon M (2006) Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 15(12):3736–3745 ArticleMathSciNetGoogle Scholar
Epshtein, Boris, Eyal Ofek, and Yonatan Wexler (2010) Detecting text in natural scenes with stroke width transform. 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE
Feng, Wei, et al. (2019) TextDragon: An end-to-end framework for arbitrary shaped text spotting. Proceedings of the IEEE/CVF International Conference on Computer Vision
Goel, Vibhor, et al. (2013) Whole is greater than sum of parts: Recognizing scene text words." 2013 12th International Conference on Document Analysis and Recognition. IEEE
Gupta N, Jalal AS (2019) A robust model for salient text detection in natural scene images using MSER feature detector and Grabcut. Multimed Tools Appl 78(8):10821–10835 ArticleGoogle Scholar
Han, Junwei, et al. (2019) P-CNN: Part-based convolutional neural networks for fine-grained visual categorization. IEEE transactions on pattern analysis and machine intelligence
He T, Huang W, Qiao Y, Yao J (2016) Text-attentional convolutional neural network for scene text detection. IEEE Trans Image Process 25(6):2529–2541 ArticleMathSciNetMATHGoogle Scholar
He W et al (2020) Realtime multi-scale scene text detection with scale-based region proposal network. Pattern Recognition 98:107026 ArticleGoogle Scholar
Huang, Weilin, et al. (2013) Text localization in natural images using stroke feature transform and text covariance descriptors." Proceedings of the IEEE international conference on computer vision
Huang, Weilin, Yu Qiao, and Xiaoou Tang (2014) Robust scene text detection with convolution neural network induced mser trees. European conference on computer vision. Springer, Cham
Islam, Md Rabiul, et al. (2016) Text detection and recognition using enhanced MSER detection and a novel OCR technique. 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV). IEEE
Jaderberg, Max, et al. (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. European conference on computer vision, Springer, Cham BookGoogle Scholar
Jaderberg, Max, et al. (2014) Deep structured output learning for unconstrained text recognition. arXiv preprint arXiv:1412.5903
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vis 116(1):1–20 ArticleMathSciNetGoogle Scholar
Jain AK, Bin Y (1998) Automatic text location in images and video frames. Pattern Recogn 31(12):2055–2076 ArticleGoogle Scholar
Jeong, Munho, and Kang-Hyun Jo (2015) "Multi language text detection using fast stroke width transform." 2015 21st Korea-Japan joint workshop on Frontiers of computer vision (FCV). IEEE
Jiang, Yingying, et al. (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv preprint arXiv:1706.09579
Karatzas, Dimosthenis, et al. (2013) ICDAR 2013 robust reading competition. 2013 12th International Conference on Document Analysis and Recognition. IEEE
Karatzas, Dimosthenis, et al. (2015) ICDAR 2015 competition on robust reading. 2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE
Koo HI, Kim DH (2013) Scene text detection via connected component clustering and nontext filtering. IEEE Trans Image Process 22(6):2296–2305 ArticleMathSciNetMATHGoogle Scholar
Kumar S (2016) Krishan Kumar, and Rahul Kumar Mishra. "scene text recognition using artificial neural network: a survey.". Int J Comput Appl 137(6):40–50 Google Scholar
Lee C-Y, Osindero S (2016) Recursive recurrent nets with attention modeling for ocr in the wild. Proc IEEE Conf Comput Vis Pattern Recognit
Liao, Minghui, et al. (2019) Scene text recognition from two-dimensional perspective. Proceedings of the AAAI Conference on Artificial Intelligence. 33:01
Liao, Minghui, et al. (2017) Textboxes: A fast text detector with a single deep neural network. Proceedings of the AAAI conference on artificial intelligence. 31:1
Liao, Minghui, et al. (2018) Rotation-sensitive regression for oriented scene text detection. Proceedings of the IEEE conference on computer vision and pattern recognition
Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690 ArticleMathSciNetMATHGoogle Scholar
Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Doc Anal Recognit 22(2):143–162 ArticleGoogle Scholar
Liu F, Chen C, Gu D, Zheng J (2019) FTPN: scene text detection with feature pyramid based text proposal network. IEEE Access 7:44219–44228 ArticleGoogle Scholar
Long, Shangbang, et al. (2020) A new perspective for flexible feature gathering in scene text recognition via character anchor pooling. ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE
Long S, He X, Yao C (2020) Scene text detection and recognition: the deep learning era. Int J Comput Vis 129:1–24 Google Scholar
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R, Ashida K, Nagai H, Okamoto M, Yamamoto H, Miyao H, Zhu JM, Ou WW, Wolf C, Jolion J-M, Todoran L, Worring M, Lin X (2005) ICDAR 2003 robust reading competitions: entries, results, and future directions. IJDAR 7(2–3):105–122 ArticleGoogle Scholar
Luo C, Jin L, Sun Z (2019) Moran: a multi-object rectified attention network for scene text recognition. Pattern Recogn 90:109–118 ArticleGoogle Scholar
Lyu, Pengyuan, et al. (2018) Multi-oriented scene text detection via corner localization and region segmentation. Proceedings of the IEEE conference on computer vision and pattern recognition
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimedia 20(11):3111–3122 ArticleGoogle Scholar
Mishra, Anand, Karteek Alahari, and Jawahar CV (2012) Scene text recognition using higher order language priors. BMVC-British Machine Vision Conference. BMVA
Mishra, Anand, Karteek Alahari, and Jawahar CV (2012) Top-down and bottom-up cues for scene text recognition." 2012 IEEE conference on computer vision and pattern recognition. IEEE
Naiemi F, Ghods V, Khalesi H (2019) An efficient character recognition method using enhanced HOG for spam image detection. Soft Comput 23(22):11759–11774 ArticleGoogle Scholar
Naiemi F, Ghods V, Khalesi H (2020) Scene text detection using enhanced extremal region and convolutional neural network. Multimed Tools Appl 79(37):27137–27159 ArticleGoogle Scholar
Naiemi, Fatemeh, Vahid Ghods, and Hassan Khalesi (2021) MOSTL: an accurate multi oriented scene text localization. Circuits, Systems, and Signal Processing, in press
Naiemi F, Ghods V, Khalesi H (2021) A novel pipeline framework for multi oriented scene text image detection and recognition. Expert Syst Appl 170:114549 ArticleGoogle Scholar
Nayef, Nibal, et al. (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt." 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE
Nayef, Nibal, et al. (2019) ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition—RRC-MLT-2019. 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE
Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. Asian conference on computer vision, Springer, Berlin, Heidelberg Google Scholar
Neumann, Lukáš, and Jiří Matas (2012) Real-time scene text localization and recognition. 2012 IEEE conference on computer vision and pattern recognition. IEEE
Neumann L, Matas J (2015) Real-time lexicon-free scene text localization and recognition. IEEE Trans Pattern Anal Mach Intell 38(9):1872–1885 ArticleGoogle Scholar
Neycharan JG, Ahmadyfard A (2018) Edge color transform: a new operator for natural scene text localization. Multimed Tools Appl 77(6):7615–7636 ArticleGoogle Scholar
Novikova, Tatiana, et al. (2012) Large-lexicon attribute-consistent text recognition in natural images." European conference on computer vision. Springer, Berlin, Heidelberg
Pan Y-F, Hou X, Liu C-L (2010) A hybrid approach to detect and localize texts in natural scene images. IEEE Trans Image Process 20(3):800–813 MathSciNetMATHGoogle Scholar
Qiao, Liang, et al. (2020) Text perceptron: Towards end-to-end arbitrary-shaped text spotting. Proceedings of the AAAI Conference on Artificial Intelligence. 34:07
Qin, Siyang, et al. (2019) Towards unconstrained end-to-end text spotting. Proceedings of the IEEE/CVF International Conference on Computer Vision
Ranjbarzadeh R, Saadi SB (2020) Automated liver and tumor segmentation based on concave and convex points using fuzzy c-means and mean shift clustering. Measurement 150:107086 ArticleGoogle Scholar
Ren X, Zhou Y, Huang Z, Sun J, Yang X, Chen K (2017) A novel text structure feature extractor for Chinese scene text detection and recognition. IEEE Access 5:3193–3204 ArticleGoogle Scholar
Rodriguez-Serrano JA, Gordo A, Perronnin F (2015) Label embedding: a frugal baseline for text recognition. Int J Comput Vis 113(3):193–207 ArticleGoogle Scholar
Shahab, Asif, Faisal Shafait, and Andreas Dengel (2011) ICDAR 2011 robust reading competition challenge 2: Reading text in scene images. 2011 international conference on document analysis and recognition. IEEE
Shi, Baoguang, et al. (2016) Robust scene text recognition with automatic rectification. Proceedings of the IEEE conference on computer vision and pattern recognition
Shi, Cunzhao, et al. (2013) Scene text recognition using part-based tree-structured character detection." Proceedings of the IEEE conference on computer vision and pattern recognition
Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304 ArticleGoogle Scholar
Shivakumara P, Phan TQ, Tan CL (2010) A laplacian approach to multi-oriented text detection in video. IEEE Trans Pattern Anal Mach Intell 33(2):412–419 ArticleGoogle Scholar
Shivakumara P, Phan TQ, Lu S, Tan CL (2013) Gradient vector flow and grouping-based method for arbitrarily oriented scene text detection in video images. IEEE Trans Circuits Syst Video Technol 23(10):1729–1739 ArticleGoogle Scholar
Su, Bolan, and Shijian Lu. (2014) Accurate scene text recognition based on recurrent neural network." Asian Conference on Computer Vision. Springer, Cham
Sung, Myung-Chul, et al. (2015) Scene text detection with robust character candidate extraction method." 2015 13th International conference on document analysis and recognition (ICDAR). IEEE
Tabassum, Adiba, and Shweta A. Dhondse (2015) Text detection using MSER and stroke width transform." 2015 Fifth International Conference on Communication Systems and Network Technologies. IEEE
Tian, Zhi, et al. (2016) Detecting text in natural image with connectionist text proposal network. European conference on computer vision. Springer, Cham
Vasilopoulos N, Kavallieratou E (2017) Unified layout analysis and text localization framework. J Electron Imaging 26(1):013009 ArticleGoogle Scholar
Veit, Andreas, et al. (2016) Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv:1601.07140
Wang K, Belongie S (2010) Word spotting in the wild. European conference on computer vision, Springer, Berlin, Heidelberg BookGoogle Scholar
Wang, Jianfeng, and Xiaolin Hu. (2017) Gated recurrent convolution neural network for ocr. Proceedings of the 31st International Conference on Neural Information Processing Systems
Wang, Kai, Boris Babenko, and Serge Belongie (2011) "End-to-end scene text recognition." 2011 International Conference on Computer Vision. IEEE
Wang, Kai, Boris Babenko, and Serge Belongie (2011) End-to-end scene text recognition. 2011 International Conference on Computer Vision. IEEE
Wang, Tao, et al. (2012) End-to-end text recognition with convolutional neural networks. Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE
Wang R, Sang N, Gao C (2015) Text detection approach based on confidence map and context information. Neurocomputing 157:153–165 ArticleGoogle Scholar
Wang, Wenhai, et al. (2019) Shape robust text detection with progressive scale expansion network. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Wang Q, Huang Y, Jia W, He X, Blumenstein M, Lyu S, Lu Y (2020) FACLSTM: ConvLSTM with focused attention for scene text recognition. Science China Inf Sci 63(2):1–14 MathSciNetGoogle Scholar
Wright J, Yang AY, Ganesh A, Sastry SS, Yi Ma (2008) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227 ArticleGoogle Scholar
Yang, Xiao, et al. (2017) Learning to Read Irregular Text with Attention Mechanisms. IJCAI. 1:2
Yang, Qiangpeng, et al. (2018) Inceptext: A new inception-text module with deformable psroi pooling for multi-oriented scene text detection. arXiv preprint arXiv:1805.01167
Yao, Cong, et al. (2012) Detecting texts of arbitrary orientations in natural images." 2012 IEEE conference on computer vision and pattern recognition. IEEE
Yao C, Bai X, Liu W (2014) A unified framework for multioriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4749 ArticleMathSciNetMATHGoogle Scholar
Yao, Cong, et al. (2014) Strokelets: A learned multi-scale representation for scene text recognition. Proceedings of the IEEE conference on computer vision and pattern recognition
Yao, Cong, et al. (2016) Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002
Ye Q, Doermann D (2014) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500 ArticleGoogle Scholar
Ye Q, Huang Q, Gao W, Zhao D (2005) Fast and robust text detection in images and video frames. Image Vis Comput 23(6):565–576 ArticleGoogle Scholar
Yin X-C et al (2013) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983 Google Scholar
Yuan J, Wei B, Liu Y, Zhang Y, Wang L (2015) A method for text line detection in natural images. Multimed Tools Appl 74(3):859–884 ArticleGoogle Scholar
Zhan F, Shijian L (2019) Esir: end-to-end scene text recognition via iterative image rectification. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zhang, Yaping, et al. (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zhang H, Zhao K, Song Y-Z, Guo J (2013) Text extraction from natural scene image: a survey. Neurocomputing 122:310–323 ArticleGoogle Scholar
Zhang, Zheng, et al. (2015) Symmetry-based text line detection in natural scenes. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhang, Zheng, et al. (2016) Multi-oriented text detection with fully convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition
Zhang D, Meng D, Han J (2016) Co-saliency detection via a self-paced multiple-instance learning framework. IEEE Trans Pattern Anal Mach Intell 39(5):865–878 ArticleGoogle Scholar
Zhang, Chengquan, et al. (2019) Look more than once: An accurate detector for text of arbitrary shapes. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zheng Y, Iwana BK, Uchida S (2019) Mining the displacement of max-pooling for text recognition. Pattern Recogn 93:558–569 ArticleGoogle Scholar
Zhong Z, Sun L, Huo Q (2019) An anchor-free region proposal network for faster R-CNN-based text detection approaches. Int J Doc Anal Recognit 22(3):315–327 ArticleGoogle Scholar
Zhou, Xinyu, et al. (2017) East: an efficient and accurate scene text detector. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition
Zhu Y, Yao C, Bai X (2016) Scene text detection and recognition: recent advances and future trends. Front Comput Sci 10(1):19–36 ArticleGoogle Scholar
Zhu W et al (2017) Scene text detection via extremal region based double threshold convolutional network classification. PloS one 12.8:e0182227 ArticleGoogle Scholar
Zhu, Zhen, et al. (2018) Feature Fusion for Scene Text Detection. 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). IEEE

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Semnan Branch, Islamic Azad University, Semnan, Iran Fatemeh Naiemi & Vahid Ghods
Department of Electrical and Computer Engineering, Garmsar Branch, Islamic Azad University, Garmsar, Iran Hassan Khalesi

Fatemeh Naiemi