Cognitive-inspired Post-processing of optical character recognition for Swedish addresses

↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Sökning: id:"swepub:oai:DiVA.org:kth-333343" > Cognitive-inspired ...

1 av 1
Föregående post
Nästa post
Till träfflistan

Cognitive-inspired Post-processing of optical character recognition for Swedish addresses

Andersson, Moa (författare): KTH,Skolan för elektroteknik och datavetenskap (EECS),Royal Institute Of Technology, Stockholm, Sweden

Kanwal, Summrina (författare): Högskolan i Halmstad,KTH,Skolan för elektroteknik och datavetenskap (EECS),Akademin för informationsteknologi,Royal Institute Of Technology, Stockholm, Sweden

Khan, Faiza (författare): Riphah International University, Faculty of Computing, Islamabad, Pakistan

(creator_code:org_t)

Piscataway, NJ : Institute of Electrical and Electronics Engineers (IEEE), 2022
2022
Engelska.
Ingår i: Proceedings of 2022 IEEE 21st International Conference on Cognitive Informatics and Cognitive Computing, ICCI*CC 2022. - Piscataway, NJ : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 248-257, s. 248-257

Relaterad länk:: https://urn.kb.se/re...; visa fler...; https://doi.org/10.1...; https://urn.kb.se/re...; visa färre...

Konferensbidrag (refereegranskat)

Abstract Ämnesord

Stäng

Optical character recognition (OCR) has many ap-plications, such as digitizing historical documents, automating processes, and helping visually impaired people read. However, extracting text from images into a digital format is not an easy problem to solve, and the outputs from the OCR frameworks often include errors. The complexity comes from the many variations in (digital) fonts, handwriting, lighting, etc. To tackle this problem, this thesis investigates two different methods for correcting the errors in OCR output. The used dataset consists of Swedish addresses. The methods are therefore applied to postal automation to investigate the usage of these methods for further automating postal work by automatically reading addresses on parcels using OCR. The first method, the lexical implementation, uses a dataset of Swedish addresses so that any valid address should be in this dataset (hence there is a known and limited vocabulary), and misspelled addresses are corrected to the address in the lexicon with the smallest Levenshtein distance. The second approach uses the same dataset, but with artificial errors, or artificial noise, added. The addresses with this artificial noise are then used together with their correct spelling to train a machine learning model based on Neural machine translation (NMT) to automatically correct errors in OCR read addresses. The results from this study could contribute by defining in what direction future work connected to OCR and postal addresses should go. The results were that the lexical implementation outperformed the NMT model. However, more experiments including real data would be required to draw definitive conclusions as to how the methods would work in real-life applications.

Hitta via bibliotek

Proceedings of 2022 IEEE 21st International Conference on Cognitive Informatics ... (Sök värdpublikationen i LIBRIS)

Till lärosätets databas

1 av 1
Föregående post
Nästa post
Till träfflistan

Hitta mer i SwePub

Av författaren/redakt...: Andersson, Moa; Kanwal, Summrina; Khan, Faiza

Om ämnet

NATURVETENSKAP: NATURVETENSKAP; och Data och informa ...; och Språkteknologi

NATURVETENSKAP: NATURVETENSKAP; och Data och informa ...; och Datavetenskap

Artiklar i publikationen: Proceedings of 2 ...; Proceedings of 2 ...

Av lärosätet: Kungliga Tekniska Högskolan; Högskolan i Halmstad

Sök utanför SwePub

Sök vidare i:: Google; Google Book Search; Google Scholar

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

LIBRIS.kb.se

Cognitive-inspired Post-processing of optical character recognition for Swedish addresses

Ämnesord

Nyckelord

Publikations- och innehållstyp

Hitta via bibliotek

Till lärosätets databas

Hitta mer i SwePub

Sök utanför SwePub