Sökning: id:"swepub:oai:DiVA.org:su-182155" >
A Semi-supervised A...
A Semi-supervised Approach for De-identification of Swedish Clinical Text
-
- Berg, Hanna (författare)
- Stockholms universitet,Institutionen för data- och systemvetenskap
-
- Dalianis, Hercules (författare)
- Stockholms universitet,Institutionen för data- och systemvetenskap
-
(creator_code:org_t)
- European Language Resources Association, 2020
- 2020
- Engelska.
-
Ingår i: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). - : European Language Resources Association. - 9791095546344 ; , s. 4444-4450
- Relaterad länk:
-
https://www.aclweb.o...
-
visa fler...
-
https://su.diva-port... (primary) (Raw object)
-
https://urn.kb.se/re...
-
visa färre...
Abstract
Ämnesord
Stäng
- An abundance of electronic health records (EHR) is produced every day within healthcare. The records possess valuable information for research and future improvement of healthcare. Multiple efforts have been done to protect the integrity of patients while making electronic health records usable for research by removing personally identifiable information in patient records. Supervised machine learning approaches for de-identification of EHRs need annotated data for training, annotations that are costly in time and human resources. The annotation costs for clinical text is even more costly as the process must be carried out in a protected environment with a limited number of annotators who must have signed confidentiality agreements. In this paper is therefore, a semi-supervised method proposed, for automatically creating high-quality training data. The study shows that the method can be used to improve recall from 84.75% to 89.20% without sacrificing precision to the same extent, dropping from 95.73% to 94.20%. The model’s recall is arguably more important for de-identification than precision.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences (hsv//eng)
Nyckelord
- semi-supervised learning
- self training
- de-identification
- Swedish clinical text
- Computer and Systems Sciences
- data- och systemvetenskap
Publikations- och innehållstyp
- ref (ämneskategori)
- kon (ämneskategori)
Hitta via bibliotek
Till lärosätets databas