SwePub
Tyck till om SwePub Sök här!
Sök i LIBRIS databas

  Utökad sökning

onr:"swepub:oai:DiVA.org:ltu-95407"
 

Sökning: onr:"swepub:oai:DiVA.org:ltu-95407" > Processing of Condi...

Processing of Condition Monitoring Annotations with BERT and Technical Language Substitution: A Case Study

Löwenmark, Karl (författare)
Luleå tekniska universitet,EISLAB,Lulea University of Technology, Sweden
Taal, Cees (författare)
SKF Research & Technology Development, Meidoornkade 14, 3992 AE Houten, P.O. Box 2350, 3430 DT Nieuwegein, The Netherlands,SKF Research & Technology Development, Sweden
Nivre, Joakim, 1962- (författare)
RISE,Datavetenskap,RISE Research Institutes of Sweden, Isafjordsgatan 22, 164 40 Kista, Sweden, P.O. Box 857, 501 15 Borås, Sweden
visa fler...
Liwicki, Marcus (författare)
Luleå tekniska universitet,EISLAB,Lulea University of Technology, Sweden
Sandin, Fredrik, 1977- (författare)
Luleå tekniska universitet,EISLAB,Lulea University of Technology, Sweden
visa färre...
 (creator_code:org_t)
2022-06-29
2022
Engelska.
Ingår i: Proceedings of the 7th European Conference of the Prognostics and Health Management Society 2022. - : PHM Society. - 9781936263363 ; , s. 306-314
  • Konferensbidrag (refereegranskat)
Abstract Ämnesord
Stäng  
  • Annotations in condition monitoring systems contain information regarding asset history and fault characteristics in the form of unstructured text that could, if unlocked, be used for intelligent fault diagnosis. However, processing these annotations with pre-trained natural language models such as BERT is problematic due to out-of-vocabulary (OOV) technical terms, resulting in inaccurate language embeddings. Here we investigate the effect of OOV technical terms on BERT and SentenceBERT embeddings by substituting technical terms with natural language descriptions. The embeddings were computed for each annotation in a pre-processed corpus, with and without substitution. The K-Means clustering score was calculated on sentence embeddings, and a Long Short-Term Memory (LSTM) network was trained on word embeddings with the objective to recreate the output from a keyword-based annotation classifier. The K-Means score for SentenceBERT annotation embeddings improved by 40% at seven clusters by technical language substitution, and the labelling capacityof the BERT-LSTM model was improved from 88.3 to 94.2%. These results indicate that the substitution of OOV technical terms can improve the representation accuracy of the embeddings of the pre-trained BERT and SentenceBERT models, and that pre-trained language models can be used to process technical language.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Språkteknologi (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Language Technology (hsv//eng)

Nyckelord

Technical Language Processing
Natural Language Processing
Condition Monitoring
Intelligent Fault Diagnosis
Maskininlärning
Machine Learning
Cyberfysiska system
Cyber-Physical Systems
Datorlingvistik

Publikations- och innehållstyp

ref (ämneskategori)
kon (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy