SwePub
Sök i LIBRIS databas

  Utökad sökning

WFRF:(Abbas Cheddad)
 

Sökning: WFRF:(Abbas Cheddad) > Comparative Study o...

Comparative Study of Layout Analysis of Tabulated Historical Documents

Liang, Xusheng (författare)
Blekinge Tekniska Högskola,student
Cheddad, Abbas (författare)
Blekinge Tekniska Högskola,Institutionen för datavetenskap
Hall, Johan (författare)
ArkivDigital AB, SWE
 (creator_code:org_t)
Elsevier Inc. 2021
2021
Engelska.
Ingår i: Big Data Research. - : Elsevier Inc.. - 2214-5796 .- 2214-580X. ; 24
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • Nowadays, the field of multimedia retrieval system has earned a lot of attention as it helps retrieve information more efficiently and accelerates daily tasks. Within this context, image processing techniques such as layout analysis and word recognition play an important role in transcribing content in printed or handwritten documents into digital data that can be further processed. This transcription procedure is called document digitization. This work stems from an industrial need, namely, a Swedish company (ArkivDigital AB) has scanned more than 80 million pages of Swedish historical documents from all over the country and there is a high demand to transcribe the contents into digital data. Such process starts by figuring out text location which, seen from another angle, is merely table layout analysis. In this study, the aim is to reveal the most effective solution to extract document layout w.r.t Swedish handwritten historical documents that are featured by their tabular forms. In short, outcome of public tools (i.e., Breuel's OCRopus method), traditional image processing techniques (e.g., Hessian/Gabor filters, Hough transform, Histograms of oriented gradients -HOG- features), machine learning techniques (e.g., support vector machines, transfer learning) are studied and compared. Results show that the existing OCR tool cannot carry layout analysis task on our Swedish historical handwritten documents. Traditional image processing techniques are mildly capable of extracting the general table layout in these documents, but the accuracy is enhanced by introducing machine learning techniques. The best performing approach will be used in our future document mining research to allow for the development of scalable resource-efficient systems for big data analytics. © 2021 Elsevier Inc.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Sciences (hsv//eng)
NATURVETENSKAP  -- Data- och informationsvetenskap -- Datorseende och robotik (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Vision and Robotics (hsv//eng)

Nyckelord

Feature extraction
Historical handwritten documents
Image processing
Layout analysis
Machine learning

Publikations- och innehållstyp

ref (ämneskategori)
art (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy