SwePub
Sök i LIBRIS databas

  Utökad sökning

id:"swepub:oai:DiVA.org:lnu-88769"
 

Sökning: id:"swepub:oai:DiVA.org:lnu-88769" > The impact of deep ...

The impact of deep learning on document classification using semantically rich representations

Kastrati, Zenun, 1984- (författare)
Norwegian University of Science and Technology, Norway
Imran, Ali Shariq (författare)
Norwegian University of Science and Technology, Norway
Yayilgan, Sule Yildirim (författare)
Norwegian University of Science and Technology, Norway
 (creator_code:org_t)
Elsevier, 2019
2019
Engelska.
Ingår i: Information Processing & Management. - : Elsevier. - 0306-4573 .- 1873-5371. ; 56:5, s. 1618-1632
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • This paper presents a semantically rich document representation model for automatically classifying financial documents into predefined categories utilizing deep learning. The model architecture consists of two main modules including document representation and document classification. In the first module, a document is enriched with semantics using background knowledge provided by an ontology and through the acquisition of its relevant terminology. Acquisition of terminology integrated to the ontology extends the capabilities of semantically rich document representations with an in depth-coverage of concepts, thereby capturing the whole conceptualization involved in documents. Semantically rich representations obtained from the first module will serve as input to the document classification module which aims at finding the most appropriate category for that document through deep learning. Three different deep learning networks each belonging to a different category of machine learning techniques for ontological document classification using a real-life ontology are used. Multiple simulations are carried out with various deep neural networks configurations, and our findings reveal that a three hidden layer feedforward network with 1024 neurons obtain the highest document classification performance on the INFUSE dataset. The performance in terms of F1 score is further increased by almost five percentage points to 78.10% for the same network configuration when the relevant terminology integrated to the ontology is applied to enrich document representation. Furthermore, we conducted a comparative performance evaluation using various state-of-the-art document representation approaches and classification techniques including shallow and conventional machine learning classifiers.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Computer Sciences (hsv//eng)

Nyckelord

Document representation
Document classification
Deep learning
Ontology
Machine learning
Informatik
Information Systems

Publikations- och innehållstyp

ref (ämneskategori)
art (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy