SwePub
Tyck till om SwePub Sök här!
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "hsv:(NATURVETENSKAP) ;lar1:(sprakochfolkminnen)"

Sökning: hsv:(NATURVETENSKAP) > Institutet för språk och folkminnen

  • Resultat 1-10 av 14
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ahltorp, Magnus, et al. (författare)
  • Textual Contexts for "Democracy" : Using Topic- and Word-Models for Exploring Swedish Government Official Reports
  • 2021
  • Konferensbidrag (refereegranskat)abstract
    • We here demonstrate how two types of NLP models - a topic model and a word2vec model - can be combined for exploring the content of a collection of Swedish Government Reports. We investigate if there are topics that frequently occur in paragraphs mentioning the word "democracy". Using the word2vec model, 530 clusters of semantically similar words were created, which were then applied in the pre-processing step when creating a topic model. This model detected 15 reoccurring topics among the paragraphs containing "democracy". Among these topics, 13 had closely associated paragraphs with a coherent content relating to some aspect of democracy.
  •  
2.
  •  
3.
  • Edqvist, Bengt (författare)
  • Föreställningar om fladdermöss
  • 2014
  • Ingår i: Naturen för mig. - Göteborg : Institutet för språk och folkminnen. - 9789186959142 ; , s. 57-59
  • Bokkapitel (populärvet., debatt m.m.)
  •  
4.
  •  
5.
  • Eriksson, Gunnar, 1954-, et al. (författare)
  • Features for modelling characteristics of conversations : Notebook for PAN at CLEF 2012
  • 2012
  • Ingår i: CLEF 2012 Evaluation Labs and Workshop Online Working Notes.
  • Konferensbidrag (refereegranskat)abstract
    • In this experiment, we find that features which model interaction andconversational behaviour contribute well to identifying sexual grooming behaviourin chat and forum text. Together with the obviously useful lexical features —which we find are more valuable if separated by who generates them — weachieve very successful results in identifying behavioural patterns which maycharacterise sexual grooming. We conjecture that the general framework can beused for other purposes than this specific case if the lexical features are exchangedfor other topical models, the conversational features characterise interaction andbehaviour rather than topical choice.
  •  
6.
  •  
7.
  • Skeppstedt, Maria, 1977-, et al. (författare)
  • A Pipeline for Manual Annotations of Risk Factor Mentions in the COVID-19 Open Research Dataset
  • 2021
  • Ingår i: Selected Papers from the CLARIN Annual Conference 2020.
  • Konferensbidrag (refereegranskat)abstract
    • We here demonstrate how a set of tools that are being maintained and further developed within the Språkbanken Sam and SWE-CLARIN infrastructures can be employed for creating manually labelled training data in a low-resource setting. As example text, we used the “COVID-19 Open Research Dataset”, and created manually annotated training data for its associated Kaggle task,“What do we know about COVID-19 risk factors?”. We first used our topic modelling tool to i) select a text set for manual annotation, ii) classify the texts into preliminary classification categories, and iii) analyse the texts in search for potential refinements of the annotation categories. We then annotated the text set on a more granular level by labelling the token sequences that indicated the existence of the refined categories in the text. Finally, we used the granularly annotated text set as a seed set, and applied our active learning tool for actively selecting additional texts for annotation. For the token-sequence annotations, we used our text annotation tool, which includes support for incorporating automatic pre-annotations.
  •  
8.
  • Skeppstedt, Maria, Dr. 1977-, et al. (författare)
  • A Snapshot of Climate Change Arguments: Searching for Recurring Themes in Tweets on Climate Change
  • 2022
  • Ingår i: CLARIN Annual Conference Proceedings. ; , s. 65-68
  • Konferensbidrag (refereegranskat)abstract
    • We applied the topic modelling tool Topics2Themes to a collection of German tweets on the subject of climate change, the GerCCT corpus. Topics2Themes is currently being further developed and evaluated within Språkbanken Sam, which is a part of SWE-CLARIN. The tool automatically extracted 15 topics from the tweet collection. We used the graphical user interface of Topics2Themes to manually search for recurring themes among the eight tweets most closely associated with the topics extracted. Although the content of the tweets associated with a topic was often diverse, we were still able to identify recurring themes. More specifically, 14 themes that occurred at least three times were identified in the texts analysed.
  •  
9.
  • Skeppstedt, Maria, 1977-, et al. (författare)
  • Application of a topic model visualisation tool to a second language
  • 2019
  • Ingår i: CLARIN 2019 Book of absracts. - : CLARIN, Common Language Resources and Technology Infrastructure.
  • Konferensbidrag (refereegranskat)abstract
    • We explored adaptions required for applying a topic modelling tool to a language that is very different from the one for which the tool was originally developed. The tool, which enables text analysis on the output of topic modelling, was developed for English, and we here applied it on Japanese texts. As white space is not used for indicating word boundaries in Japanese, the texts had to be pre-tokenised and white space inserted to indicate a token segmentation, before the texts could be imported into the tool. The tool was also extended by the addition of word translations and phonetic readings to support users who are second-language speakers of Japanese.
  •  
10.
  • Skeppstedt, Maria, et al. (författare)
  • From word clouds to Word Rain: Revisiting the classic word cloud to visualize climate change texts
  • 2024
  • Ingår i: Information Visualization. - : Sage Publications. - 1473-8716 .- 1473-8724.
  • Tidskriftsartikel (refereegranskat)abstract
    • Word Rain is a development of the classic word cloud. It addresses some of the limitations of word clouds, in particular the lack of a semantically motivated positioning of the words, and the use of font size as a sole indicator of word prominence. Word Rain uses the semantic information encoded in a distributional semantics-based language model – reduced into one dimension – to position the words along the x-axis. Thereby, the horizontal positioning of the words reflects semantic similarity. Font size is still used to signal word prominence, but this signal is supplemented with a bar chart, as well as with the position of the words on the y-axis. We exemplify the use of Word Rain by three concrete visualization tasks, applied on different real-world texts and document collections on climate change. In these case studies, word2vec models, reduced to one dimension with t-SNE, are used to encode semantic similarity, and TF-IDF is used for measuring word prominence. We evaluate the technique further by carrying out domain expert reviews.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 14

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy