SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Dalianis Hercules) "

Sökning: WFRF:(Dalianis Hercules)

  • Resultat 1-50 av 144
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ahltorp, Magnus, et al. (författare)
  • Using text prediction for facilitating input and improving readability of clinical text
  • 2013
  • Ingår i: Studies in Health Technology and Informatics. - : IOS Press. - 9781614992882 - 9781614992899 ; , s. 1149-
  • Konferensbidrag (refereegranskat)abstract
    • Text prediction has the potential for facilitating and speeding up the documentation work within health care, making it possible for health personnel to allocate less time to documentation and more time to patient care. It also offers a way to produce clinical text with fewer misspellings and abbreviations, increasing readability. We have explored how text prediction can be used for input of clinical text, and how the specific challenges of text prediction in this domain can be addressed. A text prediction prototype was constructed using data from a medical journal and from medical terminologies. This prototype achieved keystroke savings of 26% when evaluated on texts mimicking authentic clinical text. The results are encouraging, indicating that there are feasible methods for text prediction in the clinical domain.
  •  
2.
  • Alam, Mahbub Ul, et al. (författare)
  • Deep Learning from Heterogeneous Sequences of Sparse Medical Data for Early Prediction of Sepsis
  • 2020
  • Ingår i: Proceedings of the 13th International Joint Conference on Biomedical Engineering Systems and Technologies, Volume 5: HEALTHINF. - Setúbal : SciTePress. - 9789897583988 ; , s. 45-55
  • Konferensbidrag (refereegranskat)abstract
    • Sepsis is a life-threatening complication to infections, and early treatment is key for survival. Symptoms of sepsis are difficult to recognize, but prediction models using data from electronic health records (EHRs) can facilitate early detection and intervention. Recently, deep learning architectures have been proposed for the early prediction of sepsis. However, most efforts rely on high-resolution data from intensive care units (ICUs). Prediction of sepsis in the non-ICU setting, where hospitalization periods vary greatly in length and data is more sparse, is not as well studied. It is also not clear how to learn effectively from longitudinal EHR data, which can be represented as a sequence of time windows. In this article, we evaluate the use of an LSTM network for early prediction of sepsis according to Sepsis-3 criteria in a general hospital population. An empirical investigation using six different time window sizes is conducted. The best model uses a two-hour window and assumes data is missing not at random, clearly outperforming scoring systems commonly used in healthcare today. It is concluded that the size of the time window has a considerable impact on predictive performance when learning from heterogeneous sequences of sparse medical data for early prediction of sepsis.
  •  
3.
  • Alam, Mahbub Ul, et al. (författare)
  • Terminology Expansion with Prototype Embeddings : Extracting Symptoms of Urinary Tract Infection from Clinical Text
  • 2021
  • Ingår i: Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2021) - Volume 5: HEALTHINF. - Setúbal : SciTePress. - 9789897584909 ; , s. 47-57
  • Konferensbidrag (refereegranskat)abstract
    • Many natural language processing applications rely on the availability of domain-specific terminologies containing synonyms. To that end, semi-automatic methods for extracting additional synonyms of a given concept from corpora are useful, especially in low-resource domains and noisy genres such as clinical text, where nonstandard language use and misspellings are prevalent. In this study, prototype embeddings based on seed words were used to create representations for (i) specific urinary tract infection (UTI) symptoms and (ii) UTI symptoms in general. Four word embedding methods and two phrase detection methods were evaluated using clinical data from Karolinska University Hospital. It is shown that prototype embeddings can effectively capture semantic information related to UTI symptoms. Using prototype embeddings for specific UTI symptoms led to the extraction of more symptom terms compared to using prototype embeddings for UTI symptoms in general. Overall, 142 additional UTI symp tom terms were identified, yielding a more than 100% increment compared to the initial seed set. The mean average precision across all UTI symptoms was 0.51, and as high as 0.86 for one specific UTI symptom. This study provides an effective and cost-effective solution to terminology expansion with small amounts of labeled data.
  •  
4.
  • Alfalahi, Alyaa, et al. (författare)
  • Pseudonymisation of Personal Names and other PHIs in an Annotated Clinical Swedish Corpus
  • 2012
  • Ingår i: LREC 2012, Eighth International Conference on Language Resources and Evaluation. - 9782951740877
  • Konferensbidrag (refereegranskat)abstract
    • Today a large number of patient records are produced and these records contain valuable information, often in free text, about the medical treatment of patients. Since these records contain information that can reveal the identity of patients, known as protected health information (PHI), the records cannot easily be made available for the research community. In this research we have used a PHI annotated clinical corpora, written in Swedish, that we have pseudonymised. Pseudonymisation means to replace the sensitive information with fictive information for example real personal names are replaced with fictive personal names based on the gender of the real names and family relations. We have evaluated our results and our five respondents of who three were clinicians found that the clinical text looks real and is readable. We have also added pseudonymisation for telephone numbers, locations, health care units, dates and ages. In this paper we also present the entire de-identification and pseudonymisation process of a sample clinical text.
  •  
5.
  • Allvin, Helen, et al. (författare)
  • Characteristics of Finnish and Swedish intensive care nursing narratives : a comparative analysis to support the development of clinical language technologies
  • 2011
  • Ingår i: Journal of Biomedical Semantics. - 2041-1480. ; 2:S1, s. 1-11
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Free text is helpful for entering information into electronic health records, but reusing it is a challenge. The need for language technology for processing Finnish and Swedish healthcare text is therefore evident; however, Finnish and Swedish are linguistically very dissimilar. In this paper we present a comparison of characteristics in Finnish and Swedish free-text nursing narratives from intensive care. This creates a framework for characterising and comparing clinical text and lays the groundwork for developing clinical language technologies. Methods: Our material included daily nursing narratives from one intensive care unit in Finland and one in Sweden. Inclusion criteria for patients were an inpatient period of least five days and an age of at least 16 years. We performed a comparative analysis as part of a collaborative effort between Finnish- and Swedish-speaking healthcare and language technology professionals that included both qualitative and quantitative aspects. The qualitative analysis addressed the content and structure of three average- sized health records from each country. In the quantitative analysis 514 Finnish and 379 Swedish health records were studied using various language technology tools. Results: Although the two languages are not closely related, nursing narratives in Finland and Sweden had many properties in common. Both made use of specialised jargon and their content was very similar. However, many of these characteristics were challenging regarding development of language technology to support producing and using clinical documentation. Conclusions: The way Finnish and Swedish intensive care nursing was documented, was not country or language dependent, but shared a common context, principles and structural features and even similar vocabulary elements. Technology solutions are therefore likely to be applicable to a wider range of natural languages, but they need linguistic tailoring. Availability: The Finnish and Swedish data can be found at: http://www.dsv.su.se/ hexanord/data/
  •  
6.
  • Andrenucci, Andrea, et al. (författare)
  • Knowledge patterns for online health portal development
  • 2019
  • Ingår i: Health Informatics Journal. - : SAGE Publications. - 1460-4582 .- 1741-2811. ; 25:4, s. 1779-1799
  • Tidskriftsartikel (refereegranskat)abstract
    • This article describes the development and evaluation of a set of knowledge patterns that provide guidelines and implications of design for developers of mental health portals. The knowledge patterns were based on three foundations: 1) Knowledge integration of language technology approaches; 2) Experiments with language technology applications and 3) User studies of portal interaction. A mixed-methods approach was employed for the evaluation of the knowledge patterns: formative workshops with knowledge pattern experts and summative surveys with experts in specific domains. The formative evaluation improved the cohesion of the patterns. The results of the summative evaluation showed that the problems discussed in the patterns were relevant for the domain and that the knowledge embedded was useful to solve them. Ten patterns out of thirteen achieved an average score above 4.0, which is a positive result that leads us to conclude that they can be used as guidelines for developing health portals.
  •  
7.
  • Andrenucci, Andrea, 1971- (författare)
  • Using Language Technology to Mediate Medical Information on Health Portals : User Studies and Experiments
  • 2018
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • The World Wide Web has revolutionized our lifestyle, our economies and services within health care. Health care services are no longer provided only at specialist centers and at scheduled hours, but also through online tools that give health care consumers access to medical information, health records, medical counselling and peer support. Such tools and applications are generally available on larger web sites or gateways called health portals. A large majority of online medical information consumers are laypeople (i.e. non experts) who appreciate the possibility to submit their information needs in their own native language. The information retrieval process where information requests from users and retrieved documents/answers are in different languages is called cross-language information retrieval (CLIR). Mental health is one of the medical areas where some online applications have been successfully deployed in order to help people by providing in-depth medical information, counseling and advice. Despite the fact that online health portals are considered priority e-health tools for improving mental health, there are no formal knowledge instruments such as knowledge patterns that explicitly support the development of online health portals in the field of psychology/psychotherapy. The goal of this research is to produce and evaluate a set of knowledge patterns, for the development and implementation of cross-lingual online health portals aimed at information seekers without medical expertise in the domain of psychology and psychotherapy. The knowledge patterns synthetize results of three research foundations: 1) User studies of portal interaction, based on interviews and observations about how users experience health information online and personalized search 2) Knowledge integration of existing language technology approaches, and 3) Experiments with language technology applications, in the field of cross-lingual information retrieval/question-answering. The target groups of this research are developers, researchers and health care providers, i.e. people who are responsible for mediating medical information on online health portals for users without medical expertise. The chosen research framework is design science, i.e. the science that focuses on the study, development and evaluation of artefacts (objects that help people solve a practical problem). Typical examples of artefacts in IT are algorithms, software solutions and databases, but also objects such as processes or knowledge patterns. The developed and evaluated artefact in this research is a set of knowledge patterns for online health portal development. The developed artefact contains fourteen knowledge patterns covering the three research foundations. Formative (structured workshops) and summative (online survey) evaluation of the artefact indicate that the knowledge patterns are useful, relevant and adoptable to a large extent, they also provide further directions for development of online mental health portals. Developing portals with multilingual support and tailored interfaces has the potential of helping larger groups of citizens to access relevant medical information.
  •  
8.
  • Bampa, Maria, et al. (författare)
  • Detecting Adverse Drug Events from Swedish Electronic Health Records using Text Mining
  • 2020
  • Ingår i: Proceedings of the LREC 2020 Workshop on Multilingual Biomedical Text Processing (MultilingualBIO 2020). - : European Language Resources Association. - 9791095546658 ; , s. 1-8
  • Konferensbidrag (refereegranskat)abstract
    • Electronic Health Records are a valuable source of patient information which can be leveraged to detect Adverse Drug Events (ADEs) and aid post-mark drug-surveillance. The overall aim of this study is to scrutinize text written by clinicians in the EHRs and build a model for ADE detection that produces medically relevant predictions. Natural Language Processing techniques will be exploited to create important predictors and incorporate them into the learning process. The study focuses on the 5 most frequent ADE cases found ina Swedish electronic patient record corpus. The results indicate that considering textual features, rather than the structured, can improve the classification performance by 15{\%} in some ADE cases. Additionally, variable patient history lengths are incorporated in the models, demonstrating the importance of the above decision rather than using an arbitrary number for a history length. The experimental findings suggest that the clinical text in EHRs includes information that can capture data beyond the ones that are found in a structured format.
  •  
9.
  • Berg, Hanna, et al. (författare)
  • A Semi-supervised Approach for De-identification of Swedish Clinical Text
  • 2020
  • Ingår i: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020). - : European Language Resources Association. - 9791095546344 ; , s. 4444-4450
  • Konferensbidrag (refereegranskat)abstract
    • An abundance of electronic health records (EHR) is produced every day within healthcare. The records possess valuable information for research and future improvement of healthcare. Multiple efforts have been done to protect the integrity of patients while making electronic health records usable for research by removing personally identifiable information in patient records. Supervised machine learning approaches for de-identification of EHRs need annotated data for training, annotations that are costly in time and human resources. The annotation costs for clinical text is even more costly as the process must be carried out in a protected environment with a limited number of annotators who must have signed confidentiality agreements. In this paper is therefore, a semi-supervised method proposed, for automatically creating high-quality training data. The study shows that the method can be used to improve recall from 84.75% to 89.20% without sacrificing precision to the same extent, dropping from 95.73% to 94.20%. The model’s recall is arguably more important for de-identification than precision.
  •  
10.
  • Berg, Hanna, et al. (författare)
  • Augmenting a De-identification System for Swedish Clinical Text Using Open Resources and Deep Learning
  • 2019
  • Ingår i: Proceedings of the Workshop on NLP and Pseudonymisation. - Linköping : Linköping University Electronic Press. - 9789179299965 ; , s. 8-15
  • Konferensbidrag (refereegranskat)abstract
    • Electronic patient records are produced in abundance every day and there is a demand to use them for research or management purposes. The records, however, contain information in the free text that can identify the patient and therefore tools are needed to identify this sensitive information. The aim is to compare two machine learning algorithms, Long Short-Term Memory (LSTM) and Conditional Random Fields (CRF) applied to a Swedish clinical data set annotated for de-identification. The results show that CRF performs better than deep learning with LSTM, with CRF giving the best results with an F1 score of 0.91 when adding more data from within the same domain. Adding general open data did, on the other hand, not improve the results.
  •  
11.
  • Berg, Hanna, et al. (författare)
  • Building a De-identification System for Real Swedish Clinical Text Using Pseudonymised Clinical Text
  • 2019
  • Ingår i: Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019). - : Association for Computational Linguistics. - 9781950737772 ; , s. 118-125
  • Konferensbidrag (refereegranskat)abstract
    • This article presents experiments with pseudonymised Swedish clinical text used as training data to de-identify real clinical text with the future aim to transfer non-sensitive training data to other hospitals. Conditional Random Fields (CFR) and Long Short-Term Memory (LSTM) machine learning algorithms were used to train de-identification models. The two models were trained on pseudonymised data and evaluated on real data. For benchmarking, models were also trained on real data, and evaluated on real data as well as trained on pseudonymised data and evaluated on pseudonymised data. CRF showed better performance for some PHI information like Date Part, First Name and Last Name; consistent with some reports in the literature. In contrast, poor performances on Location and Health Care Unit information were noted, partially due to the constrained vocabulary in the pseudonymised training data. It is concluded that it is possible to train transferable models based on pseudonymised Swedish clinical data, but even small narrative and distributional variation could negatively impact performance.
  •  
12.
  • Berg, Hanna, et al. (författare)
  • De-identification of Clinical Text for Secondary Use : Research Issues
  • 2021
  • Ingår i: Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies - (Volume 5). - : SciTePress. - 9789897584909 ; , s. 592-599
  • Konferensbidrag (refereegranskat)abstract
    • Privacy is challenged by both advances in AI-related technologies and recently introduced legal regulations. The problem of privacy has been extensively studied within the privacy community, but has largely focused on methods for protecting and assessing the privacy of structured data. Research aiming to protect the integrity of patients based on clinical text has primarily referred to US law and relied on automatically recognising predetermined, both direct and indirect, identifiers. This article discusses the various challenges concerning the re-use of unstructured clinical data, in particular in the form of clinical text, and focuses on ambiguous and vague terminology, how different legislation affects the requirements for de-identification, differences between methods for unstructured and structured data, the impact of approaches based on named entity recognition and replacing sensitive data with surrogates, as well as the lack of measures for usability and re-identification risk.
  •  
13.
  • Berg, Hanna, et al. (författare)
  • The Impact of De-identification on Downstream Named Entity Recognition in Clinical Text
  • 2020
  • Ingår i: The 11th International Workshop on Health Text Mining and Information Analysis LOUHI 2020. - USA : Association for Computational Linguistics. - 9781952148811 ; , s. 1-11
  • Konferensbidrag (refereegranskat)abstract
    • The impact of de-identification on data quality and, in particular, utility for developing models for downstream tasks has been more thoroughly studied for structured data than for unstructured text. While previous studies indicate that text de-identification has a limited impact on models for downstream tasks, it remains unclear what the impact is with various levels and forms of de-identification, in particular concerning the trade-off between precision and recall. In this paper, the impact of de-identification is studied on downstream named entity recognition in Swedish clinical text. The results indicate that de-identification models with moderate to high precision lead to similar downstream performance, while low precision has a substantial negative impact. Furthermore, different strategies for concealing sensitive information affect performance to different degrees, ranging from pseudonymisation having a low impact to the removal of entire sentences with sensitive information having a high impact. This study indicates that it is possible to increase the recall of models for identifying sensitive information without negatively affecting the use of de-identified text data for training models for clinical named entity recognition; however, there is ultimately a trade-off between the level of de-identification and the subsequent utility of the data.
  •  
14.
  • Berg, Nils, et al. (författare)
  • Using BART to Automatically Generate Discharge Summaries from Swedish Clinical Text
  • 2024
  • Ingår i: Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024. - : Association for Computational Linguistics. ; , s. 246-252
  • Konferensbidrag (refereegranskat)abstract
    • Documentation is a regular part of contemporary healthcare practices and one such documentation task is the creation of a discharge summary, which summarizes a care episode. However, to manually write discharge summaries is a time-consuming task, and research has shown that discharge summaries are often lacking quality in various respects. To alleviate this problem, text summarization methods could be applied on text from electronic health records, such as patient notes, to automatically create a discharge summary. Previous research has been conducted on this topic on text in various languages and with various methods, but no such research has been conducted on Swedish text. In this paper, four data sets extracted from a Swedish clinical corpora were used to fine-tune four BART language models to perform the task of summarizing Swedish patient notes into a discharge summary. Out of these models, the best performing model was manually evaluated by a senior, now retired, nurse and clinical coder. The evaluation results show that the best performing model produces discharge summaries of overall low quality. This is possibly due to issues in the data extracted from the Health Bank research infrastructure, which warrants further work on this topic.
  •  
15.
  • Blanco, Alberto, et al. (författare)
  • Implementation of specialised attention mechanisms : ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish
  • 2022
  • Ingår i: Journal of Biomedical Informatics. - : Elsevier BV. - 1532-0464 .- 1532-0480. ; 130
  • Tidskriftsartikel (refereegranskat)abstract
    • Multi-label classification according to the International Classification of Diseases (ICD) is an Extreme Multi-label Classification task aiming to categorise health records according to a set of relevant ICD codes. We implemented PlaBERT, a new multi-label text classification head with per-label attention, on top of a BERT model. The model assessment is conducted on Electronic Health Records, conveying Discharge Summaries in three languages – English, Spanish, and Swedish. The study focuses on 157 diagnostic codes from the ICD. We additionally measure the labelling noise to estimate the consistency of the gold standard. Our specialised attention mechanism computes attention weights for each input token and label pair, obtaining the specific relevance of every word concerning each ICD code. The PlaBERT model outputs the computed attention importance for each token and label, allowing for visualisation. Our best results are 40.65, 38.36, and 41.13 F1-Score points on the English, Spanish and Swedish datasets, respectively, for the 157 gastrointestinal codes. Besides, Precision is the metric that most significantly improves owing to the attention mechanism of PlaBERT, with an increase of 44.63, 40.93, and 12.92 points, respectively, for the Spanish, Swedish and English datasets.
  •  
16.
  • Blanco, Alberto, et al. (författare)
  • On the Contribution of Per-ICD Attention Mechanisms to Classify Health Records in Languages With Fewer Resources than English
  • 2021
  • Ingår i: INTERNATIONAL CONFERENCE RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING 2021. - Shoumen : INCOMA Ltd.. - 9789544520724 ; , s. 165-172
  • Konferensbidrag (refereegranskat)abstract
    • We introduce a multi-label text classifier with per-label attention for the classification of Electronic Health Records according to the International Classification of Diseases. We apply the model on two Electronic Health Records datasets with Discharge Summaries in two languages with fewer resources than En- glish, Spanish and Swedish. Our model lever- ages the BERT Multilingual model (specifically the Wikipedia, as the model have been trained with 104 languages, including Spanish and Swedish, with the largest Wikipedia dumps1) to share the language modelling capabilities across the languages. With the per-label attention, the model can compute the relevance of each word from the EHR towards the prediction of each label. For the experimental framework, we apply 157 labels from Chapter XI – Diseases of the Digestive System of the ICD, which makes the attention especially important as the model has to discriminate between similar diseases.
  •  
17.
  • Boström, Henrik, et al. (författare)
  • De-identifying health records by means of active learning
  • 2012
  • Ingår i:
  • Konferensbidrag (refereegranskat)abstract
    • An experiment on classifying words in Swedish health records as belonging to one of eight protected health information (PHI) classes, or to the non-PHI class, by means of active learning has been conducted, in which three selection strategies were evaluated in conjunction with random forests; the commonly employed approach of choosing the most uncertain examples, choosing randomly, and choosing the most certain examples. Surprisingly, random selection outperformed choosing the most uncertain examples with respect to ten considered performance metrics. Moreover, choosing the most certain examples outperformed random selection with respect to nine out of ten metrics.
  •  
18.
  • Budrionis, Andrius, et al. (författare)
  • Negation detection in Norwegian medical text : Porting a Swedish NegEx to Norwegian. Work in progress
  • 2018
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents an initial effort in developing a negation detection algorithm for Norwegian clinical text. An evaluated version of NegEx for Swedish was extended to support Norwegian clinical text, by translating the negation triggers and adding more negation rules as well as using a pre-processed Norwegian ICD-10 diagnosis code list to detect symptoms and diagnoses. Due to limited access to the Norwegian clinical text the Norwegian NegEx was tested on Norwegian medical scientific text. NegEx found 70 negated symptoms/diagnoses in the text combined of 170 publications in the medical domain. The results are not completely evaluated due to the lacking gold standard. Some challenging erroneous tokenizations of Norwegian words were found in addition to the need for improved preprocessing and matching techniques for the Norwegian ICD-10 code list. This work pointed out the weaknesses of the current implementation and provided insights for future work.
  •  
19.
  • Caccamisi, Andrea, et al. (författare)
  • Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records
  • 2020
  • Ingår i: Upsala Journal of Medical Sciences. - : Uppsala Medical Society. - 0300-9734 .- 2000-1967. ; 125:4, s. 316-324
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: The electronic medical record (EMR) offers unique possibilities for clinical research, but some important patient attributes are not readily available due to its unstructured properties. We applied text mining using machine learning to enable automatic classification of unstructured information on smoking status from Swedish EMR data.Methods: Data on patients' smoking status from EMRs were used to develop 32 different predictive models that were trained using Weka, changing sentence frequency, classifier type, tokenization, and attribute selection in a database of 85,000 classified sentences. The models were evaluated using F-score and accuracy based on out-of-sample test data including 8500 sentences. The error weight matrix was used to select the best model, assigning a weight to each type of misclassification and applying it to the model confusion matrices. The best performing model was then compared to a rule-based method.Results: The best performing model was based on the Support Vector Machine (SVM) Sequential Minimal Optimization (SMO) classifier using a combination of unigrams and bigrams as tokens. Sentence frequency and attributes selection did not improve model performance. SMO achieved 98.14% accuracy and 0.981 F-score versus 79.32% and 0.756 for the rule-based model.Conclusion: A model using machine-learning algorithms to automatically classify patients' smoking status was successfully developed. Such algorithms may enable automatic assessment of smoking status and other unstructured data directly from EMRs without manual classification of complete case notes.
  •  
20.
  • Caccamisi, Andrea, et al. (författare)
  • PRM92 - Automatic Extraction and Classification of Patients’ Smoking Status from Free Text Using Natural Language Processing
  • 2016
  • Ingår i: Value in Health. - : Elsevier BV. - 1098-3015 .- 1524-4733. ; 19:7
  • Tidskriftsartikel (refereegranskat)abstract
    • ObjectivesTo develop a machine learning algorithm for automatic classification of smoking status (smoker, ex-smoker, non-smoker and unknown status) in EMRs, and validate the predictive accuracy compared to a rule-based method. Smoking is a leading cause of death worldwide and may introduce confounding in research based on real world data (RWD). Information on smoking is often documented in free text fields in Electronic Medical Records (EMRs), but structured RWD on smoking is sparse.Methods32 predictive models were trained with the Weka machine learning suite, tweaking sentence frequency, classifier type, tokenization and attribute selection using a database of 85,000 classified sentences. The models were evaluated using F-Score and Accuracy based on out-of-sample test data including 8,500 sentences. The error weight matrix was used to select the best model, assigning a weight to each type of misclassification and applying it to the models confusion matrices.ResultsThe best performing model was based on the Support Vector Machine (SVM) Sequential Minimal Optimization (SMO) classifier using a polynomial kernel with parameter C equal to 6 and a combination of unigrams and bigrams as tokens. Sentence frequency and attributes selection did not improve model performance. SMO achieved 98.25% accuracy and 0.982 F-Score versus 79.32% and 0.756, respectively, for the rule-based model.ConclusionsA model using machine learning algorithms to automatically classify patients smoking status was successfully developed. This algorithm would enable automatic assessment of smoking status directly from EMRs, obviating the need to extract complete case notes and manual classification.
  •  
21.
  • Carlsson, Elin, et al. (författare)
  • Influence of Module Order on Rule-Based De-identification of Personal Names in Electronic Patient Records Written in Swedish
  • 2010
  • Ingår i: Proceedings of the Seventh International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, May 19-21, 2010. - : European Language Resources Association (ELRA). ; , s. 3442-3446
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • Electronic patient records (EPRs) are a valuable resource for research but for confidentiality reasons they cannot be used freely. In order to make EPRs available to a wider group of researchers, sensitive information such as personal names has to be removed. Deidentification is a process that makes this possible. Both rule-based as well as statistical and machine learning based methods exist to perform de-identification, but the second method requires annotated training material which exists only very sparsely for patient names. It is therefore necessary to use rule-based methods for de-identification of EPRs. Not much is known, however, about the order in which the various rules should be applied and how the different rules influence precision and recall. This paper aims to answer this research question by implementing and evaluating four common rules for de-identification of personal names in EPRs written in Swedish: (1) dictionary name matching, (2) title matching, (3) common words filtering and (4) learning from previous modules. The results show that to obtain the highest recall and precision, the rules should be applied in the following order: title matching, common words filtering and dictionary name matching.
  •  
22.
  • Cerratto-Pargman, Teresa, et al. (författare)
  • User Centered Development of Automatic E-mail Answering for the Public Sector
  • 2012
  • Ingår i: Human-Computer Interaction, Tourism and Cultural Heritage. - Berlin, Heidelberg : Springer Berlin/Heidelberg. - 9783642339431 - 9783642339448 ; , s. 154-156
  • Konferensbidrag (refereegranskat)abstract
    • In Sweden, the use of e-mail by the public sector has become a key communication service between citizens and governmental authorities. Although the integration of e-mail in the public sector has certainly brought citizens and handling officers closer, it has also introduced a particular vision on governmental authorities such as for instance the idea that public service and information should be available to citizens any time, anywhere. Such a belief among citizens puts certainly high demands on the quality and efficiency of the e-service governmental authorities are capable to provide. In fact, the growing number of citizens’ electronic requests must be accurately answered in a limited time. In the research project IMAIL (Intelligent e-mail answering service for eGovernment) [1], we have focused on the work carried out at the Swedish Social Insurance Agency (SSIA) that exemplifies a governmental authority dealing with 500,000 emails per year on top of face-to face meetings, phone calls and chat communication. With the objective of creating an e-mail client capable to ease and ensure the quality of SSIAs’ handling officers public service, we have developed a prototype that: (1) automatically answer a large part of simple questions in the incoming e-mail flow, (2) improve the quality of the semi- automatic answers (i.e. answer templates), and finally, (3) reduce the workload for the handling officers. The development of the prototype is grounded in an empirical study conducted at the SSIA. The study comprises the analysis and clustering of 10,000 citizens e-mails and the working activity of 15 handling officers that were collected through questionnaires, interviews and workshops [2].
  •  
23.
  • Chomutare, Taridzo, et al. (författare)
  • Combining deep learning and fuzzy logic to predict rare ICD-10 codes from clinical notes
  • 2022
  • Ingår i: Proceedings - 2022 IEEE International Conference on Digital Health (ICDH 2022). - Piscataway : IEEE. - 9781665481496 ; , s. 163-168
  • Konferensbidrag (refereegranskat)abstract
    • Computer assisted coding (CAC) of clinical text into standardized classifications such as ICD-10 is an important challenge. For frequently used ICD-10 codes, deep learning approaches have been quite successful. For rare codes, however, the problem is still outstanding. To improve performance for rare codes, a pipeline is proposed that takes advantage of the ICD-10 code hierarchy to combine semantic capabilities of deep learning and the flexibility of fuzzy logic. The data used are discharge summaries in Swedish in the medical speciality of gastrointestinal diseases. Using our pipeline, fuzzy matching computation time is reduced and accuracy of the top 10 hits of the rare codes is also improved. While the method is promising, further work is required before the pipeline can be part of a usable prototype. Code repository: https://github.com/icd-coding/zeroshot.
  •  
24.
  • Chomutare, Taridzo, et al. (författare)
  • De-Identifying Swedish EHR Text Using Public Resources in the General Domain
  • 2020
  • Ingår i: Digital Personalized Health and Medicine. - Amsterdam : IOS Press. - 9781643680828 - 9781643680835 ; , s. 148-152
  • Konferensbidrag (refereegranskat)abstract
    • Sensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study, we add non-sensitive public datasets to EHR training data; (i) scientific medical text and (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a deep learning model using recurrent neural networks. Tests on pseudonymized Swedish EHR clinical notes showed improved precision and recall from 55.62% and 80.02% with the base EHR embedding layer, to 85.01% and 87.15% when Wikipedia word vectors are added. These results suggest that non-sensitive text from the general domain can be used to train robust models for de-identifying Swedish clinical text; and this could be useful in cases where the data is both sensitive and in low-resource languages.
  •  
25.
  • Chomutare, Taridzo, et al. (författare)
  • Improving Quality of ICD-10 (International Statistical Classification of Diseases, Tenth Revision) Coding Using AI : Protocol for a Crossover Randomized Controlled Trial
  • 2024
  • Ingår i: JMIR Research Protocols. - 1929-0748. ; 13
  • Tidskriftsartikel (refereegranskat)abstract
    • Background: Computer-assisted clinical coding (CAC) tools are designed to help clinical coders assign standardized codes, such as the ICD-10 (International Statistical Classification of Diseases, Tenth Revision), to clinical texts, such as discharge summaries. Maintaining the integrity of these standardized codes is important both for the functioning of health systems and for ensuring data used for secondary purposes are of high quality. Clinical coding is an error-prone cumbersome task, and the complexity of modern classification systems such as the ICD-11 (International Classification of Diseases, Eleventh Revision) presents significant barriers to implementation. To date, there have only been a few user studies; therefore, our understanding is still limited regarding the role CAC systems can play in reducing the burden of coding and improving the overall quality of coding. Objective: The objective of the user study is to generate both qualitative and quantitative data for measuring the usefulness of a CAC system, Easy-ICD, that was developed for recommending ICD-10 codes. Specifically, our goal is to assess whether our tool can reduce the burden on clinical coders and also improve coding quality. Methods: The user study is based on a crossover randomized controlled trial study design, where we measure the performance of clinical coders when they use our CAC tool versus when they do not. Performance is measured by the time it takes them to assign codes to both simple and complex clinical texts as well as the coding quality, that is, the accuracy of code assignment. Results: We expect the study to provide us with a measurement of the effectiveness of the CAC system compared to manual coding processes, both in terms of time use and coding quality. Positive outcomes from this study will imply that CAC tools hold the potential to reduce the burden on health care staff and will have major implications for the adoption of artificial intelligence-based CAC innovations to improve coding practice. Expected results to be published summer 2024. Conclusions: The planned user study promises a greater understanding of the impact CAC systems might have on clinical coding in real-life settings, especially with regard to coding time and quality. Further, the study may add new insights on how to meaningfully exploit current clinical text mining capabilities, with a view to reducing the burden on clinical coders, thus lowering the barriers and paving a more sustainable path to the adoption of modern coding systems, such as the new ICD-11.
  •  
26.
  • Dahl, Anders, et al. (författare)
  • Pathology text mining - on Norwegian prostate cancer reports
  • 2016
  • Ingår i: 2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW). - : IEEE Computer Society. - 9781509021093 ; , s. 84-87
  • Konferensbidrag (refereegranskat)abstract
    • Pathology reports are written by pathologists, skilled physicians, that know how to interpret disorders in various tissue samples from the human body. To obtain valuable statistics on outcome of disorders, as for example cancer and effect of treatment, statistics are collected. Therefore, cancer pathology reports interpreted and coded into databases at cancer registries. In Norway is this task carried out by the Cancer Registry of Norway (Kreftregisteret) by 25 different human coders. There is a need to automate this process. The authors of this article received 25 prostate cancer pathology reports written in Norwegian from the Cancer Registry of Norway, each documenting various stages of prostate cancer and the corresponding correct manual coding. A rule-based algorithm was produced that processed the reports in order to prototype automation. The output of the algorithm was compared to the output of the manual coding. The evaluation showed an average F-Score of 0.94 on four of these data points namely Total Malign, Primary Gleason, Secondary Gleason and Total Gleason and a lower result with on average F-score of 0.76 on all ten data points. The results are in line with previous research.
  •  
27.
  • Dalianis, Hercules (författare)
  • Clinical Text Mining : Secondary Use of Electronic Patient Records
  • 2018
  • Bok (övrigt vetenskapligt/konstnärligt)abstract
    • Patient records are written by the physician during the treatment of the patient for mnemonic reasons and internal use within the clinical unit, but the patient record is also written for legal reasons. Today a very large number of patient records are produced in the healthcare system. The patient records are mostly in electronic form and are written by health personnel. They describe initial symptoms, diagnosis, treatment and outcomes of the treatment, but they may also contain nursing narratives or daily notes. In addition, patient records contain valuable structured information such as laboratory results, blood tests and drugs. These records are seldom reused, most likely because of ignorance, but also due to a lack of tools to process them adequately, and last but not least, there are ethical policies that make the records difficult to use for research and for developing tools for physicians and researchers. There is a plethora of reasons to unlock and reuse the content of electronic patient records, since they contain valuable information about a vast number of patients who have been treated by highly skilled physicians and taken care of by well- trained and experienced nurses. Over time a massive amount of patient record data is accumulated where old knowledge can be confirmed and new knowledge can be obtained. This book was written since there was a lack of a textbook describing the area of clinical text mining. The healthcare domain area is complex and can be difficult to apprehend. There are plenty of specialised disciplines in healthcare. Applying text mining and natural language processing to health records needs special care and understanding of the domain. This book will help the reader to quickly and easily understand the health care domain. Some issues that are treated in this book are: What are the problems in clinical text mining and what are their solutions? Which are the coding and classification systems in the health care domain? What do they actually contain and how are they used? How do physicians reason to make vii viii Preface a diagnosis? What is their typical jargon when writing in the patient record? Does jargon differ between different medical specialities? This book will give the reader the background knowledge on the research front on clinical text mining and health informatics, and specifically in healthcare analytics. It is valuable for a researcher or a student who needs to learn the clinical research area in a fast and efficient way. A book is also a valuable resource for targeting a new natural language in the domain. Each additional language will add a piece to the whole equation. The experiences described in this book originate mainly from research that utilised over two million Swedish hospital records from the Karolinska University Hospital during the years 2007–2014. The general aim was to build basic tools for clinical text mining for Swedish patient records and to address specific issues. These tools were used to automatically: • detect and predict healthcare associated infections; • find adverse (drug) events; and • detect early symptoms of cancer. To accomplish this, the text in the patient records was manually annotated by physicians and then different machine learning tools were trained on these annotated texts to simulate the physicians’ skills, knowledge and intelligence. The book is also based on the extensive source of scientific literature from the large research community in clinical text mining that has been compiled and explained in this book. This book will also describe how to get access to patient records, the ethical problems involved and how to de-identify the patient records automatically before using the records, and finally, methods to build tools that will improve healthcare. The research question of this 10-year research project are many fold, and started with the general research question(s): • Using artificial intelligence to analyse patient records: Is it possible and will it improve healthcare? This actually can be distilled to several research questions of which one is of special interest: • Can one process clinical text written in Swedish with natural language processing tools developed for standard Swedish such as news paper and web texts to extract named entities such as symptoms, diagnosis, drugs and body parts from clinical text? This major issue can then be subdivided into the following questions: • Can one decide the factuality of a diagnosis found in a clinical text? What does Pneumonia? or Angina pectoris cannot be excluded or just No signs of pneumonia? really mean? • Can one determine of the temporal order of clinical events? Have the symptoms occurred a week ago or two years ago? Preface ix • Can new adverse drug effects be found by extracting relations between drug intake and adverse drug effect? • How much clinical text must be annotated manually to obtain correct and useful results? • How can patient privacy be maintained while carrying out research in clinical text mining?
  •  
28.
  • Dalianis, Hercules (författare)
  • Clinical Text Retrieval - An Overview of Basic Building Blocks and Applications
  • 2014
  • Ingår i: Professional search in the modern world. - Cham : Springer. - 9783319125107 - 9783319125114 ; , s. 147-165
  • Bokkapitel (refereegranskat)abstract
    • This article describes information retrieval, natural language processing and text mining of electronic patient record text, also called clinical text. Clinical text is written by physicians and nurses to document the health care process of the patient. First we describe some characteristics of clinical text, followed by the automatic preprocessing of the text that is necessary for making it usable for some applications. We also describe some applications for clinicians including spelling and grammar checking, ICD-10 diagnosis code assignment, as well as other applications for hospital management such as ICD-10 diagnosis code validation and detection of adverse events such as hospital acquired infections. Part of the preprocessing makes the clinical text useful for faceted search, although clinical text already has some keys for performing faceted search such as gender, age, ICD-10 diagnosis codes, ATC drug codes, etc. Preprocessing makes use of ICD-10 codes and the SNOMED-CT textual descriptions. ICD-10 codes and SNOMED-CT are available in several languages and can be considered the modern Greek or Latin of medical language. The basic research presented here has its roots in the challenges described by the health care sector. These challenges have been partially solved in academia, and we believe the solutions will be adapted to the health care sector in real world applications.
  •  
29.
  • Dalianis, Hercules, et al. (författare)
  • Clustering e-mails for the Swedish social insurance agency - What part of the e-mail thread gives the best quality?
  • 2010
  • Ingår i: Advances in Natural Language Processing. - Berlin, Heidelberg : Springer Berlin/Heidelberg. - 9783642147692 - 9783642147708 ; , s. 115-120
  • Konferensbidrag (refereegranskat)abstract
    • We need to analyse a large number of e-mails sent by the citizens to the customer services department of a governmental organisation based in Sweden. To carry out this analysis we clustered a large number of e-mails with the aim of automatic e-mail answering. One issue that came up was whether we should use the whole e-mail including the thread or just the original query for the clustering. In this paper we describe this investigation. Our results show that only the query and the answering part should be used, but not necessarily the whole e-mail thread. The results clearly show that the original question contains more useful information than only the answer, although a combination is even better. Using the full e-mail thread does not downgrade the result.
  •  
30.
  • Dalianis, Hercules, et al. (författare)
  • Comparing manual text patterns and machine learning for classification of e-mails for automatic answering by a government agency
  • 2011
  • Ingår i: 12th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2011. - Berlin, Heidelberg : Springer Berlin Heidelberg. - 9783642194368 ; , s. 234-243
  • Konferensbidrag (refereegranskat)abstract
    • E-mails to government institutions as well as to large companies may contain a large proportion of queries that can be answered in a uniform way. We analysed and manually annotated 4,404 e-mails from citizens to the Swedish Social Insurance Agency, and compared two methods for detecting answerable e-mails: manually-created text patterns (rule-based) and machine learning-based methods. We found that the text pattern-based method gave much higher precision at 89 percent than the machine learning-based method that gave only 63 percent precision. The recall was slightly higher (66 percent) for the machine learning-based methods than for the text patterns (47 percent). We also found that 23 percent of the total e-mail flow was processed by the automatic e-mail answering system.
  •  
31.
  • Dalianis, Hercules, et al. (författare)
  • Creating a reusable English-Chinese parallel corpus for bilingual dictionary construction
  • 2010
  • Ingår i: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. - : European Language Resources Association (ELRA). - 2951740867 - 9782951740860 ; , s. 1700-1705
  • Konferensbidrag (refereegranskat)abstract
    • This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluate an English-Chinese word list. The Stockholm English-Chinese Parallel Corpus (SEC) was created by downloading English-Chinese parallel corpora from a Chinese web site containing law texts that have been manually translated from Chinese to English. The parallel corpus contains 104 563 Chinese characters equivalent to 59 918 Chinese words, and the corresponding English corpus contains 75 766 English words. However Chinese writing does not utilize any delimiters to mark word boundaries so we had to carry out word segmentation as a preprocessing step on the Chinese corpus. Moreover since the parallel corpus is downloaded from Internet the corpus is noisy regarding to alignment between corresponding translated sentences. Therefore we used 60 hours of manually work to align the sentences in the English and Chinese parallel corpus before performing automatic word alignment using Uplug. The word alignment with Uplug was carried out from English to Chinese. Nine respondents evaluated the resulting English-Chinese word list with frequency equal to or above three and we obtained an accuracy of 73.1 percent.
  •  
32.
  • Dalianis, Hercules, et al. (författare)
  • Creating and Evaluating a Consensus for Negated and Speculative Words in a Swedish Clinical Corpus
  • 2010
  • Ingår i: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing ((NeSp-NLP 2010)). - Antwerp : University of Antwerp. - 9789057282669 ; , s. 5-13
  • Konferensbidrag (refereegranskat)abstract
    • In this paper we describe the creation of a consensus corpus that was obtained through combining three individual annotations of the same clinical corpus in Swedish. We used a few basic rules that were executed automatically to create the consensus. The corpus contains negation words, speculative words, uncertain expressions and certain expressions. We evaluated the consensus using it for negation and speculation cue detection. We used Stanford NER, which is based on the machine learning algorithm Conditional Random Fields for the training and detection. For comparison we also used the clinical part of the BioScope Corpus and trained it with Stanford NER. For our clinical consensus corpus in Swedish we obtained a precision of 87.9 percent and a recall of 91.7 percent for negation cues, and for English with the Bioscope Corpus we obtained a precision of 97.6 percent and a recall of 96.7 percent for negation cues.
  •  
33.
  •  
34.
  • Dalianis, Hercules, et al. (författare)
  • De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields
  • 2010
  • Ingår i: Journal of Biomedical Semantics. - : BioMed Central. - 2041-1480. ; 1:6
  • Tidskriftsartikel (refereegranskat)abstract
    • Background In order to perform research on the information contained in Electronic Patient Records (EPRs), access to the data itself is needed. This is often very difficult due to confidentiality regulations. The data sets need to be fully de-identified before they can be distributed to researchers. De-identification is a difficult task where the definitions of annotation classes are not self-evident. Results We present work on the creation of two refined variants of a manually annotated Gold standard for de-identification, one created automatically, and one created through discussions among the annotators. The data is a subset from the Stockholm EPR Corpus, a data set available within our research group. These are used for the training and evaluation of an automatic system based on the Conditional Random Fields algorithm. Evaluating with four-fold cross-validation on sets of around 4-6 000 annotation instances, we obtained very promising results for both Gold Standards: F-score around 0.80 for a number of experiments, with higher results for certain annotation classes. Moreover, 49 false positives that were verified true positives were found by the system but missed by the annotators. Conclusions Our intention is to make this Gold standard, The Stockholm EPR PHI Corpus, available to other research groups in the future. Despite being slightly more time-consuming we believe the manual consensus gold standard is the most valuable for further research. We also propose a set of annotation classes to be used for similar de-identification tasks.
  •  
35.
  • Dalianis, Hercules, et al. (författare)
  • Development of a Swedish Corpus for Evaluating Summarizers and other IR-tools
  • 2001
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • We are presenting the construction of a Swedish corpus aimed at research1on Information Retrieval, Information Extraction, Named Entity Recognitionand Multi Text Summarization, we will also present the results on evaluatingour Swedish text summarizer SweSum with this corpus. The corpus has beenconstructed by using Internet agents downloading Swedish newspaper textfrom various sources. A small part of this corpus has then been manuallyannotated. To evaluate our text summarizer SweSum we let ten studentsexecute our text summarizer with increasing compression rates on the 100manually annotated texts to find answers to predefined questions. The resultsshowed that at 40 percent summarization/compression rate the correct answerrate was 84 percent.
  •  
36.
  • Dalianis, Hercules, et al. (författare)
  • Didactic Panel : clinical Natural Language Processing in Languages Other Than English
  • 2014
  • Ingår i: AMIA Annual Symposium 2014. - : American Medical Informatics Association. ; , s. S 84-
  • Konferensbidrag (refereegranskat)abstract
    • Natural Language Processing (NLP) of clinical free-text has received a lot of attention from the scientific community. Clinical documents are routinely created across health care providing institutions and are generally written in the official language(s) of the country these institutions are located in. As a result, free-text clinical information is written in a large variety of languages. While most of the efforts for clinical NLP have focused on English, there is a strong need to extend this work to other languages, for instance in order to gain medical information about patient cohorts in geographical areas where English is not an official language. Furthermore, adapting current NLP methods developed for English to other languages may provide useful insight on the generalizability of algorithms and lead to increased robustness. This panel aims to provide an overview of clinical NLP for languages other than English, as for example French, Swedish and Bulgarian and discuss future methodological advances of clinical NLP in a context that encompasses English as well as other languages.
  •  
37.
  • Dalianis, Hercules, et al. (författare)
  • HEALTH BANK - A Workbench for Data Science Applications in Healthcare
  • 2015
  • Ingår i: Industry Track Workshop. - : CEUR Workshop Proceedings. ; , s. 1-18
  • Konferensbidrag (refereegranskat)abstract
    • The enormous amounts of data that are generated in the healthcare process and stored in electronic health record (EHR) systems are an underutilized resource that, with the use of data science applica- tions, can be exploited to improve healthcare. To foster the development and use of data science applications in healthcare, there is a fundamen- tal need for access to EHR data, which is typically not readily available to researchers and developers. A relatively rare exception is the large EHR database, the Stockholm EPR Corpus, comprising data from more than two million patients, that has been been made available to a lim- ited group of researchers at Stockholm University. Here, we describe a number of data science applications that have been developed using this database, demonstrating the potential reuse of EHR data to support healthcare and public health activities, as well as facilitate medical re- search. However, in order to realize the full potential of this resource, it needs to be made available to a larger community of researchers, as well as to industry actors. To that end, we envision the provision of an in- frastructure around this database called HEALTH BANK – the Swedish Health Record Research Bank. It will function both as a workbench for the development of data science applications and as a data explo- ration tool, allowing epidemiologists, pharmacologists and other medical researchers to generate and evaluate hypotheses. Aggregated data will be fed into a pipeline for open e-access, while non-aggregated data will be provided to researchers within an ethical permission framework. We believe that HEALTH BANK has the potential to promote a growing industry around the development of data science applications that will ultimately increase the efficiency and effectiveness of healthcare.
  •  
38.
  • Dalianis, Hercules, et al. (författare)
  • How Certain are Clinical Assessments? : Annotating Swedish Clinical Text for (Un)certainties, Speculations and Negations
  • 2010
  • Ingår i: Proceedings of the of the Seventh International Conference on Language Resources and Evaluation, LREC 2010. ; , s. 3071-3075
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • Clinical texts contain a large amount of information. Some of this information is embedded in contexts where e.g. a patient status is reasoned about, which may lead to a considerable amount of statements that indicate uncertainty and speculation. We believe that distinguishing such instances from factual statements will be very beneficial for automatic information extraction. We have annotated a subset of the Stockholm Electronic Patient Record Corpus for certain and uncertain expressions as well as speculative and negation keywords, with the purpose of creating a resource for the development of automatic detection of speculative language in Swedish clinical text. We have analyzed the results from the initial annotation trial by means of pairwise Inter-Annotator Agreement (IAA) measured with F-score. Our main findings are that IAA results for certain expressions and negations are very high, but for uncertain expressions and speculative keywords results are less encouraging. These instances need to be defined in more detail. With this annotation trial, we have created an important resource that can be used to further analyze the properties of speculative language in Swedish clinical text. Our intention is to release this subset to other research groups in the future after removing identifiable information.
  •  
39.
  •  
40.
  •  
41.
  • Dalianis, Hercules (författare)
  • Pseudonymisation of Swedish Electronic Patient Records Using a Rule-based Approach
  • 2019
  • Ingår i: Proceedings of the Workshop on NLP and Pseudonymisation. - Linköping : Linköping University Electronic Press. - 9789179299965 ; , s. 16-23
  • Konferensbidrag (refereegranskat)abstract
    • This study describes a rule-based pseudonymisation system for Swedish clinical text and its evaluation. The pseudonymisation system replaces already tagged Protected Health Information (PHI) with realistic surrogates. There are eight types of manually annotated PHIs in the electronic patient records; personal first and last names, phone numbers, locations, dates, ages and healthcare units. Two evaluators, both computer scientists, one junior and one senior, evaluated whether a set of 98 electronic patients records where pseudonymised or not. Only 3.5 percent of the records were correctly judged as pseudonymised and 1.5 percent of the real ones were wrongly judged as pseudo, giving that in average 91 percent of the pseudonymised records were judged as real.
  •  
42.
  • Dalianis, Hercules, et al. (författare)
  • Releasing a Swedish Clinical Corpus after Removing all Words – De-identification Experiments with Conditional Random Fields and Random Forests
  • 2012
  • Ingår i: Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012). ; , s. 45-48
  • Konferensbidrag (refereegranskat)abstract
    • Patient records contain valuable information in the form of both structured data and free text; however this information is sensitive since it can reveal the identity of patients. In order to allow new methods and techniques to be developed and evaluated on real world clinical data without revealing such sensitive information, researchers could be given access to de-identified records without protected health information (PHI), such as names, telephone numbers, and so on. One approach to minimizing the risk of revealing PHI when releasing text corpora from such records is to include only features of the words instead of the words themselves. Such features may include parts of speech, word length, and so on from which the sensitive information cannot be derived. In order to investigate what performance losses can be expected when replacing specific words with features, an experiment with two state-of-the-art machine learning methods, conditional random fields and random forests, is presented, comparing their ability to support de-identification, using the Stockholm EPR PHI corpus as a benchmark test. The results indicate severe performance losses when the actual words are removed, leading to the conclusion that the chosen features are not sufficient for the suggested approach to be viable.
  •  
43.
  • Dalianis, Hercules (författare)
  • Slutrapport KVALPA : Vilka KVaLitetsindikatorer i PAtientjournalens fria text behövs för att kunna mäta kvalitén på vården? Skapandet av en automatisk metod genom maskininlärning
  • 2019
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • Detta är en förstudie för att automatiskt hitta kvalitetsindikatorer i den fria texten i elektroniska patientjournaler från Karolinska universitetssjukhuset. Kvalitetsindikatorerna som studerats indikerar urinvägsinfektioner, sepsis, fallskada, trycksår, nutrition och biverkan av läkemedel. En intervjustudie genomfördes för att förstå problematiken, ett regelbaserat system implementerades i programmerings- språket Python. Systemet kallas för KVALPA och använder sig av triggerord och applicerades på 100 patientjournaler från fem olika kliniska enheter. 102 kvalitetsindikatorer hittades varav 26 var negerade och ytterligare hittades genom manuell analys. De negerade indikatorerna visar att det saknas indikatorer på dålig kvalitet, utom i fallet nutrition. Framtida utvecklingar är att utöka triggerlistan med synonymer framtagna automatiskt men också att annotera upp en guldstandard som kan användas för att evaluera precision och täckning av systemet.
  •  
44.
  • Dalianis, Hercules, et al. (författare)
  • Stockholm EPR Corpus : A Clinical Database Used to Improve Health Care
  • 2012
  • Ingår i: Proceedings of SLCT 2012. ; , s. 17-18
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • The care of patients is well documented in health records. Despite being a valuable source of information that could be mined by computers and used to improve health care, health records are not readily available for research. Moreover, the narrative parts of the records are noisy and need to be interpreted by domain experts. In this abstract we describe our experiences of gaining access to a database of electronic health records for research. We also highlight some important issues in this domain and describe a number of possible applications, including comorbidity networks, detection of hospital-acquired infections and adverse drug reactions, as well as diagnosis coding support.
  •  
45.
  • Dalianis, Hercules, 1959- (författare)
  • Sök och sammanfatta i Norden
  • 2006
  • Ingår i: Sprogteknologi i dansk perspektiv. - København : Reitsel. - 8778764599
  • Bokkapitel (övrigt vetenskapligt/konstnärligt)
  •  
46.
  •  
47.
  • Dalianis, Hercules (författare)
  • To search and summarize in Scandinavia
  • 2004
  • Ingår i: In the proceedings of The First Baltic Conference, Human Language Technologies - the Baltic Perspective, Riga, Latvia, April 21-22, 2004.
  • Konferensbidrag (refereegranskat)abstract
    • Automatic text summarization is the method where a computer summarizes a text. A text is given to the computer and it returns a non-redundant shorter text. Text summarization can be used to summarize news in the Business Intelligence domain, automatically edit news in the news paper setting domain and summarize news down to a length suitable for SMS and WAP but also to summarize news before they are synthetically read. In 1999 we created the first text summarizer for Swedish news-paper text – SweSum. SweSum has since then been ported to the following seven languages Danish, Norwegian, English, Spanish, French, German and Farsi. SweSum is freely available as a demo on the Internet and has about 2 200 users per month. A spin-off from SweSum is SiteSeeker - a commercial search engine for websites and intranets SiteSeeker has built in spelling support, stemming for Swedish, Danish and English as well as presentation of document’s extracts in the hit list. SiteSeeker is used at over 50 public websites in Sweden.
  •  
48.
  • Dalianis, Hercules (författare)
  • To Search and Summarize on Internet with Human Language Technology
  • 2005
  • Annan publikation (populärvet., debatt m.m.)abstract
    • More and more text are available on the Internet and we need tools to tame this flow. Automatic text summarization is one solution, a text is given to the computer and it returns a non-redundant shorter text. Automatic text summarization can also be used in search engines to decrease time finding documents. To further improve search engines one can use human language technology in form of word analysis as stemming and spell checking. Other methods that can be used are multilingual or cross language information retrieval in searching and finding documents written in other languages than the languages one has knowledge in. In understanding foreign languages one can use machine translation techniques that today had become good enough for practical use. Machine translation (MT) is the technique where the computer translates automatically between natural languages. The MT-techniques have been developed since the early 50’ies.
  •  
49.
  • Dalianis, Hercules, et al. (författare)
  • Using human language technology to support the handling officers at the Swedish Social Insurance Agency
  • 2009
  • Ingår i: Design and Evaluation of e-Government Applications and Services. ; , s. 30-32
  • Konferensbidrag (refereegranskat)abstract
    • The Swedish Social Insurance Agency, (Försäkringskassan) receives 40 000 per month as well as phone calls from the citizens that are handled by almost 500 handling officers. To initiate the process to make their work more efficient we carried out two user-centered design workshops with the handling officers at Försäkringskassan with the objective of finding in what ways human language technology might facilitate their work. One of the outcomes from the workshops was that the handling officers required a support tool for handling and answering e-mails from their customers. Three main requirements were identified namely to find the correct template to be used in the e-mail answers, a support to automatically create templates and finally an automatic e-mail answering function. We will during two years focus on these design challenges within the IMAIL-project.
  •  
50.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-50 av 144
Typ av publikation
konferensbidrag (94)
tidskriftsartikel (23)
doktorsavhandling (6)
bokkapitel (5)
bok (3)
proceedings (redaktörskap) (3)
visa fler...
forskningsöversikt (3)
licentiatavhandling (3)
rapport (2)
annan publikation (2)
visa färre...
Typ av innehåll
refereegranskat (109)
övrigt vetenskapligt/konstnärligt (31)
populärvet., debatt m.m. (4)
Författare/redaktör
Dalianis, Hercules (117)
Velupillai, Sumithra (24)
Dalianis, Hercules, ... (21)
Kvist, Maria (19)
Henriksson, Aron (19)
Skeppstedt, Maria (16)
visa fler...
Hassel, Martin (14)
Henriksson, Aron, 19 ... (12)
Vakili, Thomas (9)
Weegar, Rebecka (7)
Budrionis, Andrius (7)
Chomutare, Taridzo (7)
Berg, Hanna (6)
Boström, Henrik (5)
Ward, Logan (5)
Naucler, Pontus (5)
Dalianis, Hercules, ... (5)
Karlsson Valik, John (4)
Sneiders, Eriks (4)
Nilsson, Gunnar H. (4)
Cerratto-Pargman, Te ... (4)
Knutsson, Ola (4)
Pérez, Alicia (4)
Casillas, Arantza (4)
Zhao, Jing (3)
Alam, Mahbub Ul (3)
Névéol, Aurélie (3)
Kann, Viggo (3)
Ternhag, Anders (3)
Johansson, Anders, 1 ... (3)
Remmer, Sonja (3)
Tiedemann, Jörg (2)
Skeppstedt, Maria, 1 ... (2)
Pontus, Naucler (2)
Sundström, Karin (2)
Zweigenbaum, Pierre (2)
Eriksson, Gunnar (2)
Nilsson, Gunnar (2)
Nilsson, Gunnar, Pro ... (2)
Karlgren, Jussi (2)
Carlsson, Elin (2)
Färnert, Anna (2)
Gojenola, Koldo (2)
Blanco, Alberto (2)
Rosenlund, Mats (2)
Rosell, Magnus (2)
Yigzaw, Kassaye Yitb ... (2)
Makhlysheva, Alexand ... (2)
Caccamisi, Andrea (2)
Jørgensen, Leif (2)
visa färre...
Lärosäte
Stockholms universitet (137)
Kungliga Tekniska Högskolan (19)
Karolinska Institutet (14)
Umeå universitet (5)
Uppsala universitet (2)
Lunds universitet (1)
visa fler...
Mittuniversitetet (1)
RISE (1)
visa färre...
Språk
Engelska (140)
Svenska (4)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (124)
Medicin och hälsovetenskap (8)
Samhällsvetenskap (4)
Humaniora (2)
Teknik (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy