In response to popular application of extracting text from digital images or video sequence, the study proposes an integrated framework that allows automatic detection and extraction of text from digital photos of nature. At present, such process requires accurate text recognition to ensure the quality of subsequent text segmentation. The study has designed a speedy algorithm of connected component to produce individual connected components. Simple geometric features are used to filter massive amount of connected components. The remaining components go through an extraction process to identify wavelets and texture features. The extracted features are fed to Adaboost classifier as classification input. Experiments have been conducted that combine image processing and texture feature, so text extraction reaches 94.65% accuracy or above. Eventually, computational cost can be reduced through the fast convergence of Adaboost algorithm.
Journal of Convergence Information Technology,7(7):233-241.