Tyck till om SwePub Sök
här!
Sökning: onr:"swepub:oai:DiVA.org:ltu-95407" >
Processing of Condi...
Processing of Condition Monitoring Annotations with BERT and Technical Language Substitution: A Case Study
-
- Löwenmark, Karl (författare)
- Luleå tekniska universitet,EISLAB,Lulea University of Technology, Sweden
-
- Taal, Cees (författare)
- SKF Research & Technology Development, Meidoornkade 14, 3992 AE Houten, P.O. Box 2350, 3430 DT Nieuwegein, The Netherlands,SKF Research & Technology Development, Sweden
-
- Nivre, Joakim, 1962- (författare)
- RISE,Datavetenskap,RISE Research Institutes of Sweden, Isafjordsgatan 22, 164 40 Kista, Sweden, P.O. Box 857, 501 15 Borås, Sweden
-
visa fler...
-
- Liwicki, Marcus (författare)
- Luleå tekniska universitet,EISLAB,Lulea University of Technology, Sweden
-
- Sandin, Fredrik, 1977- (författare)
- Luleå tekniska universitet,EISLAB,Lulea University of Technology, Sweden
-
visa färre...
-
(creator_code:org_t)
- 2022-06-29
- 2022
- Engelska.
-
Ingår i: Proceedings of the 7th European Conference of the Prognostics and Health Management Society 2022. - : PHM Society. - 9781936263363 ; , s. 306-314
- Relaterad länk:
-
https://doi.org/10.3...
-
visa fler...
-
https://ltu.diva-por... (primary) (Raw object)
-
https://urn.kb.se/re...
-
https://doi.org/10.3...
-
https://urn.kb.se/re...
-
https://urn.kb.se/re...
-
visa färre...
Abstract
Ämnesord
Stäng
- Annotations in condition monitoring systems contain information regarding asset history and fault characteristics in the form of unstructured text that could, if unlocked, be used for intelligent fault diagnosis. However, processing these annotations with pre-trained natural language models such as BERT is problematic due to out-of-vocabulary (OOV) technical terms, resulting in inaccurate language embeddings. Here we investigate the effect of OOV technical terms on BERT and SentenceBERT embeddings by substituting technical terms with natural language descriptions. The embeddings were computed for each annotation in a pre-processed corpus, with and without substitution. The K-Means clustering score was calculated on sentence embeddings, and a Long Short-Term Memory (LSTM) network was trained on word embeddings with the objective to recreate the output from a keyword-based annotation classifier. The K-Means score for SentenceBERT annotation embeddings improved by 40% at seven clusters by technical language substitution, and the labelling capacityof the BERT-LSTM model was improved from 88.3 to 94.2%. These results indicate that the substitution of OOV technical terms can improve the representation accuracy of the embeddings of the pre-trained BERT and SentenceBERT models, and that pre-trained language models can be used to process technical language.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap -- Språkteknologi (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Language Technology (hsv//eng)
Nyckelord
- Technical Language Processing
- Natural Language Processing
- Condition Monitoring
- Intelligent Fault Diagnosis
- Maskininlärning
- Machine Learning
- Cyberfysiska system
- Cyber-Physical Systems
- Datorlingvistik
Publikations- och innehållstyp
- ref (ämneskategori)
- kon (ämneskategori)
Hitta via bibliotek
Till lärosätets databas