SwePub
Sök i LIBRIS databas

  Extended search

L773:1574 020X OR L773:1574 0218 OR L773:1572 8412
 

Search: L773:1574 020X OR L773:1574 0218 OR L773:1572 8412 > A comparative evalu...

A comparative evaluation and analysis of three generations of Distributional Semantic Models

Lenci, Alessandro (author)
Università di Pisa, Italy
Sahlgren, Magnus (author)
AI Sweden, Sweden
Jeuniaux, Patrick (author)
Institut National de Criminalistique et de Criminologie, Belgium
show more...
Cuba Gyllensten, Amaru (author)
RISE,Datavetenskap
Miliani, Martina (author)
Università per Stranieri di Siena, Italy; Università di Pisa, Italy
show less...
 (creator_code:org_t)
2022-03-02
2022
English.
In: Language resources and evaluation. - : Springer Science and Business Media B.V.. - 1574-020X .- 1574-0218. ; 56, s. 1219-
  • Journal article (peer-reviewed)
Abstract Subject headings
Close  
  • Distributional semantics has deeply changed in the last decades. First, predict models stole the thunder from traditional count ones, and more recently both of them were replaced in many NLP applications by contextualized vectors produced by neural language models. Although an extensive body of research has been devoted to Distributional Semantic Model (DSM) evaluation, we still lack a thorough comparison with respect to tested models, semantic tasks, and benchmark datasets. Moreover, previous work has mostly focused on task-driven evaluation, instead of exploring the differences between the way models represent the lexical semantic space. In this paper, we perform a large-scale evaluation of type distributional vectors, either produced by static DSMs or obtained by averaging the contextualized vectors generated by BERT. First of all, we investigate the performance of embeddings in several semantic tasks, carrying out an in-depth statistical analysis to identify the major factors influencing the behavior of DSMs. The results show that (i) the alleged superiority of predict based models is more apparent than real, and surely not ubiquitous and (ii) static DSMs surpass BERT representations in most out-of-context semantic tasks and datasets. Furthermore, we borrow from cognitive neuroscience the methodology of Representational Similarity Analysis (RSA) to inspect the semantic spaces generated by distributional models. RSA reveals important differences related to the frequency and part-of-speech of lexical items. © 2022, The Author(s).

Subject headings

NATURVETENSKAP  -- Data- och informationsvetenskap -- Språkteknologi (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Language Technology (hsv//eng)

Keyword

Contextual embeddings
Distributional semantics
Evaluation
Representational Similarity Analysis

Publication and Content Type

ref (subject category)
art (subject category)

Find in a library

To the university's database

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view