SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Tahmasebi Nina 1982) "

Search: WFRF:(Tahmasebi Nina 1982)

  • Result 1-50 of 59
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Abualhajia, Sallam, et al. (author)
  • Parameter Transfer across Domains for Word Sense Disambiguation
  • 2017
  • In: Proceedings of Recent Advances in Natural Language Processing Meet Deep Learning, Varna, Bulgaria 2–8 September 2017 / Edited by Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova. - : Incoma Ltd. Shoumen, Bulgaria. - 1313-8502 .- 2603-2813. - 9789544520489
  • Conference paper (peer-reviewed)abstract
    • Word sense disambiguation is defined as finding the corresponding sense for a target word in a given context, which comprises a major step in text applications. Recently, it has been addressed as an optimization problem. The idea behind is to find a sequence of senses that corresponds to the words in a given context with a maximum semantic similarity. Metaheuristics like simulated annealing and D-Bees provide approximate good-enough solutions, but are usually influenced by the starting parameters. In this paper, we study the parameter tuning for both algorithms within the word sense disambiguation problem. The experiments are conducted on different datasets to cover different disambiguation scenarios. We show that D-Bees is robust and less sensitive towards the initial parameters compared to simulated annealing, hence, it is sufficient to tune the parameters once and reuse them for different datasets, domains or languages.
  •  
2.
  •  
3.
  • Adesam, Yvonne, 1975, et al. (author)
  • Exploring the Quality of the Digital Historical Newspaper Archive KubHist
  • 2019
  • In: Proceedings of the 4th Conference of The Association Digital Humanities in the Nordic Countries (DHN), Copenhagen, Denmark, March 5-8, 2019 / edited by Costanza Navarretta, Manex Agirrezabal, Bente Maegaard. - Aachen : CEUR Workshop Proceedings. - 1613-0073.
  • Conference paper (peer-reviewed)abstract
    • The KubHist Corpus is a massive corpus of Swedish historical newspapers, digitized by the Royal Swedish library, and available through the Språkbanken corpus infrastructure Korp. This paper contains a first overview of the KubHist corpus, exploring some of the difficulties with the data, such as OCR errors and spelling variation, and discussing possible paths for improving the quality and the searchability.
  •  
4.
  • Ahlberg, Malin, 1986, et al. (author)
  • A case study on supervised classification of Swedish pseudo-coordination
  • 2015
  • In: Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania. - Linköpings universitet : Linköping University Electronic Press. - 1650-3686 .- 1650-3740. - 9789175190983
  • Conference paper (peer-reviewed)abstract
    • We present a case study on supervised classification of Swedish pseudo-coordination (SPC). The classification is attempted on the type-level with data collected from two data sets: a blog corpus and a fiction corpus. Two small experiments were designed to evaluate the feasability of this task. The first experiment explored a classifier’s ability to discriminate pseudo-coordinations from ordinary verb coordinations, given a small labeled data set created during the experiment. The second experiment evaluated how well the classifier performed at detecting and ranking SPCs in a set of unlabeled verb coordinations, to investigate if it could be used as a semi-automatic discovery procedure to find new SPCs.
  •  
5.
  • Berdicevskis, Aleksandrs, 1983, et al. (author)
  • Superlim: A Swedish Language Understanding Evaluation Benchmark
  • 2023
  • In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, December 6-10, 2023, Singapore / Houda Bouamor, Juan Pino, Kalika Bali (Editors). - Stroudsburg, PA : Association for Computational Linguistics. - 9798891760608
  • Conference paper (peer-reviewed)
  •  
6.
  • Borin, Lars, 1957, et al. (author)
  • Swe-Clarin: Language resources and technology for Digital Humanities
  • 2017
  • In: Digital Humanities 2016. Extended Papers of the International Symposium on Digital Humanities (DH 2016) Växjö, Sweden, November, 7-8, 2016. Edited by Koraljka Golub, Marcelo Milra. Vol-2021. - Aachen : M. Jeusfeld c/o Redaktion Sun SITE, Informatik V, RWTH Aachen.. - 1613-0073.
  • Conference paper (peer-reviewed)abstract
    • CLARIN is a European Research Infrastructure Consortium (ERIC), which aims at (a) making extensive language-based materials available as primary research data to the humanities and social sciences (HSS); and (b) offering state-of-the-art language technology (LT) as an e-research tool for this purpose, positioning CLARIN centrally in what is often referred to as the digital humanities (DH). The Swedish CLARIN node Swe-Clarin was established in 2015 with funding from the Swedish Research Council. In this paper, we describe the composition and activities of Swe-Clarin, aiming at meeting the requirements of all HSS and other researchers whose research involves using text and speech as primary research data, and spreading the awareness of what Swe-Clarin can offer these research communities. We focus on one of the central means for doing this: pilot projects conducted in collaboration between HSS researchers and Swe-Clarin, together formulating a research question, the addressing of which requires working with large language-based materials. Four such pilot projects are described in more detail, illustrating research on rhetorical history, second-language acquisition, literature, and political science. A common thread to these projects is an aspiration to meet the challenge of conducting research on the basis of very large amounts of textual data in a consistent way without losing sight of the individual cases making up the mass of data, i.e., to be able to move between Moretti’s “distant” and “close reading” modes. While the pilot projects clearly make substantial contributions to DH, they also reveal some needs for more development, and in particular a need for document-level access to the text materials. As a consequence of this, work has now been initiated in Swe-Clarin to meet this need, so that Swe-Clarin together with HSS scholars investigating intricate research questions can take on the methodological challenges of big-data language-based digital humanities.
  •  
7.
  • Borin, Lars, 1957, et al. (author)
  • Swe-Clarin: Language resources and technology for digital humanities
  • 2016
  • In: CEUR Workshop Proceedings. - 1613-0073. ; 2021, s. 29-51
  • Conference paper (peer-reviewed)abstract
    • CLARIN is a European Research Infrastructure Consortium (ERIC), which aims at (a) making extensive language-based materials available as primary research data to the humanities and social sciences (HSS); and (b) offering state-of-the-art language technology (LT) as an e-research tool for this purpose, positioning CLARIN centrally in what is often referred to as the digital humanities (DH). The Swedish CLARIN node Swe-Clarin was established in 2015 with funding from the Swedish Research Council. In this paper, we describe the composition and activities of Swe-Clarin, aiming at meeting the requirements of all HSS and other researchers whose research involves using text and speech as primary research data, and spreading the awareness of what Swe-Clarin can offer these research communities. We focus on one of the central means for doing this: pilot projects conducted in collaboration between HSS researchers and Swe-Clarin, together formulating a research question, the addressing of which requires working with large language-based materials. Four such pilot projects are described in more detail, illustrating research on rhetorical history, second-language acquisition, literature, and political science. A common thread to these projects is an aspiration to meet the challenge of conducting research on the basis of very large amounts of textual data in a consistent way without losing sight of the individual cases making up the mass of data, i.e., to be able to move between Moretti’s “distant” and “close reading” modes.
  •  
8.
  • Cassotti, Pierluigi, et al. (author)
  • Computational modeling of semantic change
  • 2024
  • In: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Tutorial Abstracts. - : Association for Computational Linguistics.
  • Conference paper (peer-reviewed)abstract
    • Languages change constantly over time, influenced by social, technological, cultural and political factors that affect how people express themselves. In particular, words can undergo the process of semantic change, which can be subtle and significantly impact the interpretation of texts. For example, the word terrific used to mean ‘causing terror’ and was as such synonymous to terrifying. Nowadays, speakers use the word in the sense of ‘excessive’ and even ‘amazing’. In Historical Linguistics, tools and methods have been developed to analyse this phenomenon, including systematic categorisations of the types of change, the causes and the mechanisms underlying the different types of change. However, traditional linguistic methods, while informative, are often based on small, carefully curated samples. Thanks to the availability of both large diachronic corpora, the computational means to model word meaning unsupervised, and evaluation benchmarks, we are seeing an increasing interest in the computational modelling of semantic change. This is evidenced by the increasing number of publications in this new domain as well as the organisation of initiatives and events related to this topic, such as four editions of the International Workshop on Computational Approaches to Historical Language Change LChange1, and several evaluation campaigns (Schlechtweg et al., 2020a; Basile et al., 2020b; Kutuzov et al.; Zamora-Reina et al., 2022).
  •  
9.
  • Cassotti, Pierluigi, et al. (author)
  • Using Synchronic Definitions and Semantic Relations to Classify Semantic Change Types
  • 2024
  • In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). - : Association for Computational Linguistics.
  • Conference paper (peer-reviewed)abstract
    • There is abundant evidence of the fact that the way words change their meaning can be classified in different types of change, highlighting the relationship between the old and new meanings (among which generalization, specialization and co-hyponymy transfer). In this paper, we present a way of detecting these types of change by constructing a model that leverages information both from synchronic lexical relations and definitions of word meanings. Specifically, we use synset definitions and hierarchy information from WordNet and test it on a digitized version of Blank's (1997) dataset of semantic change types. Finally, we show how the sense relationships can improve models for both approximation of human judgments of semantic relatedness as well as binary Lexical Semantic Change Detection.
  •  
10.
  • Computational approaches to semantic change
  • 2021
  • Editorial collection (other academic/artistic)abstract
    • Semantic change — how the meanings of words change over time — has preoccupied scholars since well before modern linguistics emerged in the late 19th and early 20th century, ushering in a new methodological turn in the study of language change. Compared to changes in sound and grammar, semantic change is the least understood. Ever since, the study of semantic change has progressed steadily, accumulating a vast store of knowledge for over a century, encompassing many languages and language families. Historical linguists also early on realized the potential of computers as research tools, with papers at the very first international conferences in computational linguistics in the 1960s. Such computational studies still tended to be small-scale, method-oriented, and qualitative. However, recent years have witnessed a sea-change in this regard. Big-data empirical quantitative investigations are now coming to the forefront, enabled by enormous advances in storage capability and processing power. Diachronic corpora have grown beyond imagination, defying exploration by traditional manual qualitative methods, and language technology has become increasingly data-driven and semantics-oriented. These developments present a golden opportunity for the empirical study of semantic change over both long and short time spans. A major challenge presently is to integrate the hard-earned knowledge and expertise of traditional historical linguistics with cutting-edge methodology explored primarily in computational linguistics. The idea for the present volume came out of a concrete response to this challenge. The 1st International Workshop on Computational Approaches to Historical Language Change (LChange'19), at ACL 2019, brought together scholars from both fields. This volume offers a survey of this exciting new direction in the study of semantic change, a discussion of the many remaining challenges that we face in pursuing it, and considerably updated and extended versions of a selection of the contributions to the LChange'19 workshop, addressing both more theoretical problems — e.g., discovery of "laws of semantic change" — and practical applications, such as information retrieval in longitudinal text archives.
  •  
11.
  • Demidova, Elena, et al. (author)
  • Analysing Entities, Topics and Events in Community Memories.
  • 2013
  • In: Proc. of the first International Workshop on Archiving Community Memories.
  • Conference paper (peer-reviewed)abstract
    • his paper briefly describes the components of the ARCOMEM architecture concerned with the extraction, enrichment, consolidation and dynamics analysis of entities, topics and events, deploying text mining, NLP, and semantic data integration technologies. In particular, we focus on four main areas relevant to support the ARCOMEM requirements and use cases: (a) entity and event extraction from text; (b) entity and event enrichment and consolidation; (c) topicdetection and dynamics; and (d) temporal aspects and dynamics detection in Web language and online social networks.
  •  
12.
  •  
13.
  • Dubossarsky, Haim, et al. (author)
  • Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change
  • 2019
  • In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, July 28 - August 2, 2019 / Anna Korhonen, David Traum, Lluís Màrquez (Editors). - Stroudsburg, PA : Association for Computational Linguistics. - 9781950737482
  • Conference paper (peer-reviewed)abstract
    • State-of-the-art models of lexical semantic change detection suffer from noise stemming from vector space alignment. We have empirically tested the Temporal Referencing method for lexical semantic change and show that, by avoiding alignment, it is less affected by this noise. We show that, trained on a diachronic corpus, the skip-gram with negative sampling architecture with temporal referencing outperforms alignment models on a synthetic task as well as a manual testset. We introduce a principled way to simulate lexical semantic change and systematically control for possible biases.
  •  
14.
  • Hengchen, Simon, 1988, et al. (author)
  • A Collection of Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data
  • 2021
  • In: Journal of Open Humanities Data. - : Ubiquity Press, Ltd.. - 2059-481X. ; 7:2, s. 1-7
  • Journal article (peer-reviewed)abstract
    • This paper describes the creation of several word embedding models based on a large collection of diachronic Swedish newspaper material available through Språkbanken Text, the Swedish language bank. This data was produced in the context of Språkbanken Text’s continued mission to collaborate with humanities and natural language processing (NLP) researchers and to provide freely available language resources, for the development of state-of-the-art NLP methods and tools.
  •  
15.
  • Hengchen, Simon, 1988, et al. (author)
  • Challenges for computational lexical semantic change
  • 2021
  • In: Computational approaches to semantic change / Tahmasebi, Nina, Borin, Lars, Jatowt, Adam, Yang, Xu, Hengchen, Simon (eds.). - Berlin : Language Science Press. - 2366-7818. - 9783985540082 ; , s. 341-372
  • Book chapter (peer-reviewed)abstract
    • The computational study of lexical semantic change (LSC) has taken off in the past few years and we are seeing increasing interest in the field, from both computational sciences and linguistics. Most of the research so far has focused on methods for modelling and detecting semantic change using large diachronic textual data, with the majority of the approaches employing neural embeddings. While methods that offer easy modelling of diachronic text are one of the main reasons for the spiking interest in LSC, neural models leave many aspects of the problem unsolved. The field has several open and complex challenges. In this chapter, we aim to describe the most important of these challenges and outline future directions.
  •  
16.
  • Hengchen, Simon, 1988, et al. (author)
  • SuperSim: a test set for word similarity and relatedness in Swedish
  • 2021
  • In: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2 2021, Reykjavik, Iceland (online). - Linköping : Linköping Electronic Conference Proceedings. - 1650-3686 .- 1650-3740. - 9789179296148
  • Conference paper (peer-reviewed)abstract
    • Language models are notoriously difficult to evaluate. We release SuperSim, a large-scale similarity and relatedness test set for Swedish built with expert human judgments. The test set is composed of 1,360 word-pairs independently judged for both relatedness and similarity by five annotators. We evaluate three different models (Word2Vec, fastText, and GloVe) trained on two separate Swedish datasets, namely the Swedish Gigaword corpus and a Swedish Wikipedia dump, to provide a baseline for future comparison. We release the fully annotated test set, code, baseline models, and data.
  •  
17.
  • Holzmann, Helge, et al. (author)
  • BlogNEER: Applying Named Entity Evolution Recognition on the Blogosphere
  • 2013
  • In: CEUR Workshop Proceedings. - 1613-0073. ; 1091, s. 28-39
  • Conference paper (peer-reviewed)abstract
    • The introduction of Social Media allowed more people to publish texts by removing barriers that are technical but also social such as the editorial controls that exist in traditional media. The resulting language tends to be more like spoken language because people adapt their use to the medium. Since spoken language is more dynamic, more new and short lived terms are introduced also in written format on the Web. In teTahmasebi2012 we presented an unsupervised method for Named Entity Evolution Recognition (NEER) to find name changes in newspaper collections. In this paper we present BlogNEER, an extension to apply NEER on blog data. The language used in blogs is often closer to spoken language than to language used in traditional media. BlogNEER introduces a novel semantic filtering method that makes use of Semantic Web resources (i.e., DBpedia) to gain more information about terms. We present the approach of BlogNEER and initial results that show the potentials of the approach.
  •  
18.
  • Holzmann, Helge, et al. (author)
  • Named entity evolution recognition on the Blogosphere
  • 2015
  • In: International Journal on Digital Libraries. - : Springer Science and Business Media LLC. - 1432-5012 .- 1432-1300. ; 15:2-4, s. 209-235
  • Journal article (peer-reviewed)abstract
    • Advancements in technology and culture lead to changes in our language. These changes create a gap between the language known by users and the language stored in digital archives. It affects user’s possibility to firstly find content and secondly interpret that content. In a previous work, we introduced our approach for named entity evolution recognition (NEER) in newspaper collections. Lately, increasing efforts in Web preservation have led to increased availability of Web archives covering longer time spans. However, language on the Web is more dynamic than in traditional media and many of the basic assumptions from the newspaper domain do not hold for Web data. In this paper we discuss the limitations of existing methodology for NEER. We approach these by adapting an existing NEER method to work on noisy data like the Web and the Blogosphere in particular. We develop novel filters that reduce the noise and make use of Semantic Web resources to obtain more information about terms. Our evaluation shows the potentials of the proposed approach.
  •  
19.
  • Jatowt, Adam, et al. (author)
  • Computational approaches to lexical semantic change: Visualization systems and novel applications
  • 2021
  • In: Computational approaches to semantic change. - Berlin : Language Science Press. - 9783961103126 ; , s. 311-339
  • Book chapter (peer-reviewed)abstract
    • The purpose of this chapter is to survey visualization and user interface solutions for understanding lexical semantic change as well as to survey a number of applications of techniques developed in computational analysis of lexical semantic change. We first overview approaches aiming to develop systems that support understanding semantic change in an interactive and visual way. It is generally accepted that computational techniques developed for analyzing and uncovering semantic change are beneficial to linguists, historians, sociologists, and practitioners in numerous related fields, especially within the humanities. However, quite a few non-professional users are equally interested in the histories of words. Developing interactive, visual, engaging, and easy-to-understand systems can help them to acquire relevant knowledge. Second, we believe that other fields could benefit from the research outcomes of computational approaches to lexical semantic change. In general, properly representing the meaning of terms used in the past should be important for a range of natural language processing, information retrieval and other tasks that operate on old texts. In the latter part of the chapter, we then focus on current and potential applications related to computer and information science with the underlying question: “How can modeling semantic change benefit wider downstream applications in these disciplines?”
  •  
20.
  •  
21.
  • Kågebäck, Mikael, 1981, et al. (author)
  • Extractive Summarization using Continuous Vector Space Models
  • 2014
  • In: Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC) EACL, April 26-30, 2014 Gothenburg, Sweden. - 9781937284947 ; , s. 31-39
  • Conference paper (peer-reviewed)abstract
    • Automatic summarization can help users extract the most important pieces of information from the vast amount of text digitized into electronic form everyday. Central to automatic summarization is the notion of similarity between sentences in text. In this paper we propose the use of continuous vector representations for semantically aware representations of sentences as a basis for measuring similarity. We evaluate different compositionsfor sentence representation on a standard dataset using the ROUGE evaluation measures. Our experiments show that the evaluated methods improve the performance of a state-of-the-art summarization framework and strongly indicate the benefits of continuous word vector representations for automatic summarization.
  •  
22.
  • Nielbo, Kristoffer L., et al. (author)
  • Quantitative text analysis
  • 2024
  • In: Nature Reviews Methods Primers. - 2662-8449. ; 4:1
  • Journal article (peer-reviewed)abstract
    • Text analysis has undergone substantial evolution since its inception, moving from manual qualitative assessments to sophisticated quantitative and computational methods. Beginning in the late twentieth century, a surge in the utilization of computational techniques reshaped the landscape of text analysis, catalysed by advances in computational power and database technologies. Researchers in various fields, from history to medicine, are now using quantitative methodologies, particularly machine learning, to extract insights from massive textual data sets. This transformation can be described in three discernible methodological stages: feature-based models, representation learning models and generative models. Although sequential, these stages are complementary, each addressing analytical challenges in the text analysis. The progression from feature-based models that require manual feature engineering to contemporary generative models, such as GPT-4 and Llama2, signifies a change in the workflow, scale and computational infrastructure of the quantitative text analysis. This Primer presents a detailed introduction of some of these developments, offering insights into the methods, principles and applications pertinent to researchers embarking on the quantitative text analysis, especially within the field of machine learning.
  •  
23.
  • Noble, Bill, et al. (author)
  • Improving Word Usage Graphs with Edge Induction
  • 2024
  • In: Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change, August 15, 2024, Bangkok, Thailand. - Stroudsburg, PA : Association for Computational Linguistics. - 9798891761384
  • Conference paper (peer-reviewed)abstract
    • This paper investigates edge induction as a method for augmenting Word Usage Graphs, in which word usages (nodes) are connected through scores (edges) representing semantic relatedness. Clustering (densely) annotated WUGs can be used as a way to find senses of a word without relying on traditional word sense annotation. However, annotating all or a majority of pairs of usages is typically infeasible, resulting in sparse graphs and, likely, lower quality senses. In this paper, we ask if filling out WUGs with edges predicted from the human annotated edges improves the eventual clusters. We experiment with edge induction models that use structural features of the existing sparse graph, as well as those that exploit textual (distributional) features of the usages. We find that in both cases, inducing edges prior to clustering improves correlation with human sense-usage annotation across three different clustering algorithms and languages.
  •  
24.
  • Nusko, Bianka, et al. (author)
  • Building a Sentiment Lexicon for Swedish
  • 2016
  • In: Linköping Electronic Conference Proceedings. - 1650-3686 .- 1650-3740. - 9789176857335 ; 126:006
  • Conference paper (peer-reviewed)abstract
    • In this paper we will present our ongoing project to build and evaluate a sentiment lexicon for Swedish. Our main resource is SALDO, a lexical resource of modern Swedish developed at Språkbanken, University of Gothenburg. Using a semi-supervised approach, we expand a manually chosen set of six core words using parent-child relations based on the semantic network structure of SALDO. At its current stage the lexicon consists of 175 seeds, 633 children, and 1319 grandchildren.
  •  
25.
  •  
26.
  • Periti, Francesco, et al. (author)
  • A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change
  • 2024
  • In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), June 16-21, 2024, Mexico City, Mexico. - : Association for Computational Linguistics. - 9798891761148
  • Conference paper (peer-reviewed)abstract
    • Contextualized embeddings are the preferred tool for modeling Lexical Semantic Change (LSC). Current evaluations typically focus on a specific task known as Graded Change Detection (GCD). However, performance comparison across work are often misleading due to their reliance on diverse settings. In this paper, we evaluate state-of-the-art models and approaches for GCD under equal conditions. We further break the LSC problem into Word-in-Context (WiC) and Word Sense Induction (WSI) tasks, and compare models across these different levels. Our evaluation is performed across different languages on eight available benchmarks for LSC, and shows that (i) APD outperforms other approaches for GCD; (ii) XL-LEXEME outperforms other contextualized models for WiC, WSI, and GCD, while being comparable to GPT-4; (iii) there is a clear need for improving the modeling of word meanings, as well as focus on how, when, and why these meanings change, rather than solely focusing on the extent of semantic change.
  •  
27.
  • Periti, Francesco, et al. (author)
  • Analyzing Semantic Change through Lexical Replacements
  • 2024
  • In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). - : Association for Computational Linguistics.
  • Conference paper (peer-reviewed)abstract
    • Modern language models are capable of contextualizing words based on their surrounding context. However, this capability is often compromised due to semantic change that leads to words being used in new, unexpected contexts not encountered during pre-training. In this paper, we model \textit{semantic change} by studying the effect of unexpected contexts introduced by lexical replacements. We propose a replacement schema where a target word is substituted with lexical replacements of varying relatedness, thus simulating different kinds of semantic change. Furthermore, we leverage the replacement schema as a basis for a novel interpretable model for semantic change. We are also the first to evaluate the use of LLaMa for semantic change detection.
  •  
28.
  • Periti, Francesco, et al. (author)
  • (Chat)GPT v BERT Dawn of Justice for Semantic Change Detection
  • 2024
  • In: Findings of the Association for Computational Linguistics: EACL 2024. - : Association for Computational Linguistics.
  • Conference paper (peer-reviewed)abstract
    • In the universe of Natural Language Processing, Transformer-based language models like BERT and (Chat)GPT have emerged as lexical superheroes with great power to solve open research problems. In this paper, we specifically focus on the temporal problem of semantic change, and evaluate their ability to solve two diachronic extensions of the Word-in-Context (WiC) task: TempoWiC and HistoWiC. In particular, we investigate the potential of a novel, off-the-shelf technology like ChatGPT (and GPT) 3.5 compared to BERT, which represents a family of models that currently stand as the state-of-the-art for modeling semantic change. Our experiments represent the first attempt to assess the use of (Chat)GPT for studying semantic change. Our results indicate that ChatGPT performs significantly worse than the foundational GPT version. Furthermore, our results demonstrate that (Chat)GPT achieves slightly lower performance than BERT in detecting long-term changes but performs significantly worse in detecting short-term changes.
  •  
29.
  • Periti, Francesco, et al. (author)
  • Studying word meaning evolution through incremental semantic shift detection
  • 2024
  • In: Language Resources and Evaluation. - 1574-020X .- 1574-0218.
  • Journal article (peer-reviewed)abstract
    • The study of semantic shift, that is, of how words change meaning as a consequence of social practices, events and political circumstances, is relevant in Natural Language Processing, Linguistics, and Social Sciences. The increasing availability of large diachronic corpora and advance in computational semantics have accelerated the development of computational approaches to detecting such shift. In this paper, we introduce a novel approach to tracing the evolution of word meaning over time. Our analysis focuses on gradual changes in word semantics and relies on an incremental approach to semantic shift detection (SSD) called What is Done is Done (WiDiD). WiDiD leverages scalable and evolutionary clustering of contextualised word embeddings to detect semantic shift and capture temporal transactions in word meanings. Existing approaches to SSD: (a) significantly simplify the semantic shift problem to cover change between two (or a few) time points, and (b) consider the existing corpora as static. We instead treat SSD as an organic process in which word meanings evolve across tens or even hundreds of time periods as the corpus is progressively made available. This results in an extremely demanding task that entails a multitude of intricate decisions. We demonstrate the applicability of this incremental approach on a diachronic corpus of Italian parliamentary speeches spanning eighteen distinct time periods. We also evaluate its performance on seven popular labelled benchmarks for SSD across multiple languages. Empirical results show that our results are comparable to state-of-the-art approaches, while outperforming the state-of-the-art for certain languages.
  •  
30.
  • Periti, Francesco, et al. (author)
  • Towards a Complete Solution to Lexical Semantic Change: an Extension to Multiple Time Periods and Diachronic Word Sense Induction
  • 2024
  • In: Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change, Aug 15, 2024, Bangkok, Thailand. - Stroudsburg, PA : Association for Computational Linguistics. - 9798891761384
  • Conference paper (peer-reviewed)abstract
    • Thus far, the research community has focused on a simplified computational modeling of semantic change between two time periods. This simplified view has served as a foundational block but is not a complete solution to the complex modeling of semantic change. Acknowledging the power of recent language models, we believe that now is the right time to extend the current modeling to multiple time periods and diachronic word sense induction. In this position paper, we outline several extensions of the current modeling and discuss issues related to the extensions.
  •  
31.
  •  
32.
  •  
33.
  •  
34.
  • Proceedings of the 4th Workshop on Computational Approaches to Historical Language Change, LChange'23, December 6th, 2023, Singapore
  • 2023
  • Editorial proceedings (other academic/artistic)abstract
    • Welcome to the 4th International Workshop on Computational Approaches to Historical Language Change (LChange’23) co-located with EMNLP 2023. LChange is held on December 6th, 2023, as a hybrid event with participation possible both virtually and on-site in Singapore. Characterizing the time-varying nature of language will have broad implications and applications in multiple fields including linguistics, artificial intelligence, digital humanities, computational cognitive and social sciences. In this workshop, we bring together the world’s pioneers and experts in computational approaches to historical language change with a focus on digital text corpora. In doing so, this workshop carries out the triple goals of disseminating state-of-the-art research on diachronic modeling of language change, fostering cross-disciplinary collaborations, and exploring the fundamental theoretical and methodological challenges in this growing niche of computational linguistic research.
  •  
35.
  •  
36.
  • Rouces, Jacobo, 1985, et al. (author)
  • Creating an Annotated Corpus for Aspect-Based Sentiment Analysis in Swedish
  • 2020
  • In: Proceedings of the 5th conference in Digital Humanities in the Nordic Countries, Riga, Latvia, October 21-23, 2020.. - : CEUR Workshop Proceedings. - 1613-0073.
  • Conference paper (peer-reviewed)abstract
    • Aspect-Based Sentiment Analysis constitutes a more fine-grained alternative to traditional sentiment analysis at sentence level. In addition to a sentiment value denoting how positive or negative a particular opinion or sentiment expression is, it identifies additional aspects or 'slots' that characterize the opinion. Some typical aspects are target and source, i.e. who holds the opinion and about which entity or aspect is the opinion. We present a large Swedish corpus annotated for Aspect-Based Sentiment Analysis. Each sentiment expression is annotated as a tuple that contains the following fields: one among 5 possible sentiment values, the target, the source, and whether the sentiment expressed is ironic. In addition, the linguistic element that conveys the sentiment is identified too. Sentiment for a particular topic is also annotated at title, paragraph and document level. The documents are articles obtained from two Swedish media (Svenska Dagbladet and Aftonbladet) and one online forum (Flashback), totalling around 4000 documents. The corpus is freely available and we plan to use it for training and testing an Aspect-Based Sentiment Analysis system.
  •  
37.
  • Rouces, Jacobo, 1985, et al. (author)
  • Defining a gold standard for a Swedish sentiment lexicon: Towards higher-yield text mining in the digital humanities
  • 2018
  • In: CEUR Workshop Proceedings vol. 2084. Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference Helsinki, Finland, March 7-9, 2018. Edited by Eetu Mäkelä Mikko Tolonen Jouni Tuominen. - Helsinki : University of Helsinki, Faculty of Arts. - 1613-0073.
  • Conference paper (peer-reviewed)abstract
    • There is an increasing demand for multilingual sentiment analysis, and most work on sentiment lexicons is still carried out based on English lexicons like WordNet. In addition, many of the non-English sentiment lexicons that do exist have been compiled by (machine) translation from English resources, thereby arguably obscuring possible language-specific characteristics of sentiment-loaded vocabulary. In this paper we describe the creation from scratch of a gold standard for the sentiment annotation of Swedish terms as a first step towards the creation of a full-fledged sentiment lexicon for Swedish.
  •  
38.
  • Rouces, Jacobo, 1985, et al. (author)
  • Generating a Gold Standard for a Swedish Sentiment Lexicon
  • 2018
  • In: LREC 2018, Eleventh International Conference on Language Resources and Evaluation, May 7-12, 2018, Miyazaki (Japan). - Miyazaki : ELRA. - 9791095546009
  • Conference paper (peer-reviewed)abstract
    • We create a gold standard for sentiment annotation of Swedish terms, using the freely available SALDO lexicon and the Gigaword corpus. For this purpose, we employ a multi-stage approach combining corpus-based frequency sampling, direct score annotation and Best-Worst Scaling. In addition to obtaining a gold standard, we analyze the data from our process and we draw conclusions about the optimal sentiment model.
  •  
39.
  • Rouces, Jacobo, 1985, et al. (author)
  • Political Stance Analysis Using Swedish Parliamentary Data
  • 2019
  • In: CEUR Workshop Proceedings (Vol. 2364). Digital Humanities in the Nordic Countries 4th Conference, Copenhagen, Denmark, March 5-8, 2019.. - Aachen : CEUR. - 1613-0073.
  • Conference paper (peer-reviewed)abstract
    • We process and visualize Swedish parliamentary data using methods from statistics and machine learning, which allows us to obtain insight into the political processes behind the data. We produce plots that let us infer the relative stance of political parties and their members on different topics. In addition, we can infer the degree of homogeneity of individual votes within different parties, as well as the degree of multi-dimensionality of Swedish politics.
  •  
40.
  • Rouces, Jacobo, 1985, et al. (author)
  • SenSALDO: Creating a Sentiment Lexicon for Swedish
  • 2018
  • In: LREC 2018, Eleventh International Conference on Language Resources and Evaluation, 7-12 May 2018, Miyazaki (Japan). - Miyazaki : ELRA. - 9791095546009
  • Conference paper (peer-reviewed)abstract
    • The natural language processing subfield known as sentiment analysis or opinion mining has seen an explosive expansion over the last decade or so, and sentiment analysis has become a standard item in the NLP toolbox. Still, many theoretical and methodological questions remain unanswered and resource gaps unfilled. Most work on automated sentiment analysis has been done on English and a few other languages; for most written languages of the world, this tool is not available. This paper describes the development of an extensive sentiment lexicon for written (standard) Swedish. We investigate different methods for developing a sentiment lexicon for Swedish. We use an existing gold standard dataset for training and testing. For each word sense from the SALDO Swedish lexicon, we assign a real value sentiment score in the range [-1,1] and produce a sentiment label. We implement and evaluate three methods: a graph-based method that iterates over the SALDO structure, a method based on random paths over the SALDO structure and a corpus-driven method based on word embeddings. The resulting sense-disambiguated sentiment lexicon (SenSALDO) is an open source resource and freely available from Språkbanken, The Swedish Language Bank at the University of Gothenburg.
  •  
41.
  • Rouces, Jacobo, 1985, et al. (author)
  • Tracking Attitudes Towards Immigration in Swedish Media
  • 2019
  • In: CEUR Workshop Proceedings (Vol. 2364). Digital Humanities in the Nordic Countries 4th Conference, Copenhagen, Denmark, March 5-8, 2019.. - Aachen : CEUR Workshop Proceedings. - 1613-0073.
  • Conference paper (peer-reviewed)abstract
    • We use a gold standard under construction for sentiment analysis in Swedish to explore how attitudes towards immigration change across time and media. We track the evolution of attitude starting from the year 2000 for three different Swedish media: the national newspapers Aftonbladet and Svenska Dagbladet, representing different halves of the left–right political spectrum, and the online forum Flashback.
  •  
42.
  • Rødven-Eide, Stian, et al. (author)
  • The Swedish Culturomics Gigaword Corpus: A One Billion Word Swedish Reference Dataset for NLP
  • 2016
  • In: Linköping Electronic Conference Proceedings. Digital Humanities 2016. From Digitization to Knowledge 2016: Resources and Methods for Semantic Processing of Digital Works/Texts, July 11, 2016, Krakow, Poland. - Linköping : Linköping University Electronic Press. - 1650-3686 .- 1650-3740. - 9789176857335
  • Conference paper (peer-reviewed)abstract
    • In this paper we present a dataset of contemporary Swedish containing one billion words. The dataset consists of a wide range of sources, all annotated using a state-of-the-art corpus annotation pipeline, and is intended to be a static and clearly versioned dataset. This will facilitate reproducibility of experiments across institutions and make it easier to compare NLP algorithms on contemporary Swedish. The dataset contains sentences from 1950 to 2015 and has been carefully designed to feature a good mix of genres balanced over each included decade. The sources include literary, journalistic, academic and legal texts, as well as blogs and web forum entries.
  •  
43.
  • Schlechtweg, Dominik, et al. (author)
  • Post-Evaluation Data for SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
  • 2020
  • In: Zenodo. - : Zenodo.
  • Other publication (other academic/artistic)abstract
    • This data collection contains the post-evaluation data for SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection: (1) the starting kit to download data, and examples for competing in the CodaLab challenge including baselines; (2) the true binary change scores of the targets for Subtask 1, and their true graded change scores for Subtask 2 (test_data_truth/); (3)the scoring program used to score submissions against the true test data in the evaluation and post-evaluation phase (scoring_program/); and (4) the results of the evaluation phase including, for example, analysis plots (plots/) displaying the results:
  •  
44.
  • Schlechtweg, Dominik, et al. (author)
  • SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection
  • 2020
  • In: Proceedings of the Fourteenth Workshop on Semantic Evaluation (SemEval2020), Barcelona, Spain (Online), December 12, 2020.. - : ACL.
  • Conference paper (peer-reviewed)abstract
    • Lexical Semantic Change detection, i.e., the task of identifying words that change meaning over time, is a very active research area, with applications in NLP, lexicography, and linguistics. Evaluation is currently the most pressing problem in Lexical Semantic Change detection, as no gold standards are available to the community, which hinders progress. We present the results of the first shared task that addresses this gap by providing researchers with an evaluation framework and manually annotated, high-quality datasets for English, German, Latin, and Swedish. 33 teams submitted 186 systems, which were evaluated on two subtasks.
  •  
45.
  • Schlechtweg, Dominik, et al. (author)
  • The DURel Annotation Tool: Human and Computational Measurement of Semantic Proximity, Sense Clusters and Semantic Change
  • 2024
  • In: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations. St. Julians, Malta. Association for Computational Linguistics, pages 137–149.
  • Conference paper (peer-reviewed)abstract
    • We present the DURel tool implementing the annotation of semantic proximity between word uses into an online, open source interface. The tool supports standardized human annotation as well as computational annotation, building on recent advances with Word-in-Context models. Annotator judgments are clustered with automatic graph clustering techniques and visualized for analysis. This allows to measure word senses with simple and intuitive micro-task judgments between use pairs, requiring minimal preparation efforts. The tool offers additional functionalities to compare the agreement between annotators to guarantee the inter-subjectivity of the obtained judgments and to calculate summary statistics over the annotated data giving insights into sense frequency distributions, semantic variation or changes of senses over time.
  •  
46.
  • Spiliotopoulos, D., et al. (author)
  • SMS 2013 PC co-chairs message
  • 2013
  • Conference paper (other academic/artistic)abstract
    • The SMS workshop 2013 on Social Media Semantics was held this year in the context of the OTM ("OnTheMove") federated conferences, covering different aspects of distributed information systems in September 2013 in Graz. The topic of the workshop is about semantics in Social Media. The SocialWeb has become the first and main medium to get and spread information. Everyday news is reported instantly, and social media has become a major source for broadcasters, news reporters and political analysts as well as a place of interaction for everyday people. For a full utilization of this medium, information must be gathered, analyzed and semantically understood. In this workshop we ask the question: how can Semantic Web technologies be used to provide the means for interested people to draw conclusions, assess situations and to preserve their findings for future use? © 2013 Springer-Verlag.
  •  
47.
  • Tahmasebi, Nina, 1982, et al. (author)
  • A Convergence of Methodologies: Notes on Data-Intensive Humanities Research
  • 2019
  • In: CEUR workshop proceedings ; 2364. Proceedings of the 4th Conference on Digital Humanities in the Nordic Countries, Copenhagen, Denmark, March 5-8, 2019 / edited by Costanza Navarretta, Manex Agirrezabal, Bente Maegaard. - Aachen : CEUR workshop proceedings. - 1613-0073.
  • Conference paper (peer-reviewed)abstract
    • In this paper, we discuss a data-intensive research methodology for the digital humanities. We highlight the differences and commonalities between quantitative and qualitative research methodologies in relation to a data-intensive research process. We argue that issues of representativeness and reduction must be in focus for all phases of the process; from the status of texts as such, over their digitization topre-processing and methodological exploration.
  •  
48.
  • Tahmasebi, Nina, 1982 (author)
  • A Study on Word2Vec on a Historical Swedish Newspaper Corpus
  • 2018
  • In: CEUR Workshop Proceedings. Vol. 2084. Proceedings of the Digital Humanities in the Nordic Countries 3rd Conference, Helsinki Finland, March 7-9, 2018. Edited by Eetu Mäkelä, Mikko Tolonen, Jouni Tuominen. - Helsinki : University of Helsinki, Faculty of Arts. - 1613-0073.
  • Conference paper (peer-reviewed)abstract
    • Detecting word sense changes can be of great interest in the field of digital humanities. Thus far, most investigations and automatic methods have been developed and carried out on English text and most recent methods make use of word embeddings. This paper presents a study on using Word2Vec, a neural word embedding method, on a Swedish historical newspaper collection. Our study includes a set of 11 words and our focus is the quality and stability of the word vectors over time. We investigate if a word embedding method like Word2Vec can be effectively used on texts where the volume and quality is limited.
  •  
49.
  • Tahmasebi, Nina, 1982, et al. (author)
  • Computational modeling of semantic change
  • 2023
  • In: Routledge Handbook of Historical Linguistics, 2nd edition. - : Routledge.
  • Book chapter (peer-reviewed)abstract
    • In this chapter we provide an overview of computational modeling for semantic change using large and semi-large textual corpora. We aim to provide a key for the interpretation of relevant methods and evaluation techniques, and also provide insights into important aspects of the computational study of semantic change. We discuss the pros and cons of different classes of models with respect to the properties of the data from which one wishes to model semantic change, and which avenues are available to evaluate the results. This chapter is forthcoming as the book has not yet been published.
  •  
50.
  • Tahmasebi, Nina, 1982, et al. (author)
  • Finding Individual Word Sense Changes and their Delay in Appearance
  • 2017
  • In: Proceedings of Recent Advances in Natural Language Processing 2017. Varna, Bulgaria 2–8 September, 2017 / edited by Galia Angelova, Kalina Bontcheva, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova. - : Incoma Ltd. Shoumen, Bulgaria. - 1313-8502 .- 2603-2813. - 9789544520489
  • Conference paper (peer-reviewed)abstract
    • We present a method for detecting word sense changes by utilizing automatically induced word senses. Our method works on the level of individual senses and allows a word to have e.g. one stable sense and then add a novel sense that later experiences change. Senses are grouped based on polysemy to find linguistic concepts and we can find broadening and narrowing as well as novel (polysemous and homonymic) senses. We evaluate on a testset, present recall and estimates of the time between expected and found change.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-50 of 59
Type of publication
conference paper (40)
journal article (6)
editorial proceedings (5)
book chapter (4)
other publication (2)
editorial collection (1)
show more...
doctoral thesis (1)
show less...
Type of content
peer-reviewed (45)
other academic/artistic (14)
Author/Editor
Tahmasebi, Nina, 198 ... (59)
Borin, Lars, 1957 (17)
Hengchen, Simon, 198 ... (13)
Adesam, Yvonne, 1975 (3)
Forsberg, Markus, 19 ... (3)
Dannélls, Dana, 1976 (3)
show more...
Jordan, Caspar (3)
McGillivray, Barbara (3)
Dubhashi, Devdatt, 1 ... (2)
Alfter, David, 1986 (2)
Megyesi, Beata (2)
Volodina, Elena, 197 ... (2)
Viklund, Jon (2)
Ekman, Stefan, 1972 (2)
Wirén, Mats (2)
Grigonyté, Gintaré (2)
Virk, Shafqat, 1979 (2)
Näsman, Jesper (2)
Zhou, Wei (1)
Exner, Peter (1)
Abualhajia, Sallam (1)
Forin, Diane (1)
Zimmermann, Karl-Hei ... (1)
Ferrara, Alfio (1)
Montanelli, Stefano (1)
Andersson, Peter, 19 ... (1)
Bouma, Gerlof, 1979 (1)
Ahlberg, Malin, 1986 (1)
Johansson, Richard, ... (1)
Berdicevskis, Aleksa ... (1)
Morger, Felix (1)
Nugues, Pierre (1)
Malmsten, Martin (1)
Sahlgren, Magnus (1)
Brodén, Daniel, 1975 (1)
Malm, Mats, 1964 (1)
Ekman, Stefan (1)
Noble, Bill (1)
Kurtz, Robin (1)
Öhman, Joey (1)
Isbister, Tim (1)
Lindahl, Anna, 1988 (1)
Rekathati, Faton (1)
Börjeson, Love (1)
Sköldberg, Emma, 196 ... (1)
Hagen, Niclas (1)
Ohlsson, Claes (1)
Volodina, Elena (1)
Björkenstam, Kristin ... (1)
Gustafson Capková, S ... (1)
show less...
University
University of Gothenburg (52)
Chalmers University of Technology (12)
Uppsala University (2)
Lund University (1)
Language
English (59)
Research subject (UKÄ/SCB)
Natural sciences (54)
Humanities (24)
Social Sciences (3)

Year

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view