Segmentation of Unstructured Newspaper Documents |
( Vol-4,Issue-5,May 2017 ) OPEN ACCESS |
Author(s): |
Santosh Naik, R. Dinesh, Prabhanjan S. |
Keywords: |
Document Layout Analysis, data extraction, document page segmentation, unstructured document. |
Abstract: |
Document layout analysis is one of the important steps in automated document recognition systems. In Document layout analysis, meaningful information is retrieved from document images by identifying, categorizing and labeling the semantics of text blocks from the document images. In this paper, we present simple top-down approach for document page segmentation. We have tested the proposed method on unstructured documents like newspaper which is having complex structures having no fixed structure. Newspaper also has multiple titles and multiple columns. In the proposed method, white gap area which separates titles, columns of text, line of text and words in lines have been identified to separate document into various segments. The proposed algorithm has been successfully implemented and applied over a large number of Indian newspapers and the results have been evaluated by number of blocks detected and taking their correct ordering information into account. |
![]() |
Paper Statistics: |
Cite this Article: |
Click here to get all Styles of Citation using DOI of the article. |
Advanced Engineering Research and Science