Search: onr:"swepub:oai:DiVA.org:su-190020" >
De-Identifying Swed...
De-Identifying Swedish EHR Text Using Public Resources in the General Domain
-
- Chomutare, Taridzo (author)
- Norwegian Centre for E-health Research, Norway
-
- Yigzaw, Kassaye Yitbarek (author)
- Norwegian Centre for E-health Research, Norway
-
- Budrionis, Andrius (author)
- Norwegian Centre for E-health Research, Norway
-
show more...
-
- Makhlysheva, Alexandra (author)
- Norwegian Centre for E-health Research, Norway
-
- Godtliebsen, Fred (author)
- Norwegian Centre for E-health Research, Norway; UiT - The Arctic University of Norway, Norway
-
- Dalianis, Hercules (author)
- Stockholms universitet,Institutionen för data- och systemvetenskap,Norwegian Centre for E-health Research, Norway
-
show less...
-
(creator_code:org_t)
- Amsterdam : IOS Press, 2020
- 2020
- English.
-
In: Digital Personalized Health and Medicine. - Amsterdam : IOS Press. - 9781643680828 - 9781643680835 ; , s. 148-152
- Related links:
-
https://doi.org/10.3...
-
show more...
-
https://su.diva-port... (primary) (Raw object)
-
https://urn.kb.se/re...
-
https://doi.org/10.3...
-
show less...
Abstract
Subject headings
Close
- Sensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study, we add non-sensitive public datasets to EHR training data; (i) scientific medical text and (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a deep learning model using recurrent neural networks. Tests on pseudonymized Swedish EHR clinical notes showed improved precision and recall from 55.62% and 80.02% with the base EHR embedding layer, to 85.01% and 87.15% when Wikipedia word vectors are added. These results suggest that non-sensitive text from the general domain can be used to train robust models for de-identifying Swedish clinical text; and this could be useful in cases where the data is both sensitive and in low-resource languages.
Subject headings
- NATURVETENSKAP -- Data- och informationsvetenskap (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences (hsv//eng)
Keyword
- EHR
- clinical text
- de-identification
- deep learning
- wiki word vectors
- data- och systemvetenskap
- Computer and Systems Sciences
Publication and Content Type
- ref (subject category)
- kon (subject category)
Find in a library
To the university's database