SwePub
Sök i LIBRIS databas

  Utökad sökning

WFRF:(Gojenola Koldo)
 

Sökning: WFRF:(Gojenola Koldo) > Semi-supervised med...

Semi-supervised medical entity recognition : A study on Spanish and Swedish clinical corpora

Pérez, Alicia (författare)
Weegar, Rebecka (författare)
Stockholms universitet,Institutionen för data- och systemvetenskap
Casillas, Arantza (författare)
visa fler...
Gojenola, Koldo (författare)
Oronoz, Maite (författare)
Dalianis, Hercules (författare)
Stockholms universitet,Institutionen för data- och systemvetenskap
visa färre...
 (creator_code:org_t)
Elsevier BV, 2017
2017
Engelska.
Ingår i: Journal of Biomedical Informatics. - : Elsevier BV. - 1532-0464 .- 1532-0480. ; 71, s. 16-30
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • Objective: The goal of this study is to investigate entity recognition within Electronic Health Records (EHRs) focusing on Spanish and Swedish. Of particular importance is a robust representation of the entities. In our case, we utilized unsupervised methods to generate such representations. Methods: The significance of this work stands on its experimental layout. The experiments were carried out under the same conditions for both languages. Several classification approaches were explored: maximum probability, CRF, Perceptron and SVM. The classifiers were enhanced by means of ensembles of semantic spaces and ensembles of Brown trees. In order to mitigate sparsity of data, without a significant increase in the dimension of the decision space, we propose the use of clustered approaches of the hierarchical Brown clustering represented by trees and vector quantization for each semantic space. Results: The results showed that the semi-supervised approaches significantly improved standard supervised techniques for both languages. Moreover, clustering the semantic spaces contributed to the quality of the entity recognition while keeping the dimension of the feature-space two orders of magnitude lower than when directly using the semantic spaces. Conclusions: The contributions of this study are: (a) a set of thorough experiments that enable comparisons regarding the influence of different types of features on different classifiers, exploring two languages other than English; and (b) the use of ensembles of clusters of Brown trees and semantic spaces on EHRs to tackle the problem of scarcity of available annotated data.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences (hsv//eng)

Nyckelord

Medical entity recognition
Supervised and unsupervised learning
Health records
Computer and Systems Sciences
data- och systemvetenskap

Publikations- och innehållstyp

ref (ämneskategori)
art (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy