SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Hengchen Simon 1988) "

Sökning: WFRF:(Hengchen Simon 1988)

  • Resultat 1-10 av 20
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Berdicevskis, Aleksandrs, 1983, et al. (författare)
  • Superlim: A Swedish Language Understanding Evaluation Benchmark
  • 2023
  • Ingår i: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, December 6-10, 2023, Singapore / Houda Bouamor, Juan Pino, Kalika Bali (Editors). - Stroudsburg, PA : Association for Computational Linguistics. - 9798891760608
  • Konferensbidrag (refereegranskat)
  •  
2.
  • Computational approaches to semantic change
  • 2021
  • Samlingsverk (redaktörskap) (övrigt vetenskapligt/konstnärligt)abstract
    • Semantic change — how the meanings of words change over time — has preoccupied scholars since well before modern linguistics emerged in the late 19th and early 20th century, ushering in a new methodological turn in the study of language change. Compared to changes in sound and grammar, semantic change is the least understood. Ever since, the study of semantic change has progressed steadily, accumulating a vast store of knowledge for over a century, encompassing many languages and language families. Historical linguists also early on realized the potential of computers as research tools, with papers at the very first international conferences in computational linguistics in the 1960s. Such computational studies still tended to be small-scale, method-oriented, and qualitative. However, recent years have witnessed a sea-change in this regard. Big-data empirical quantitative investigations are now coming to the forefront, enabled by enormous advances in storage capability and processing power. Diachronic corpora have grown beyond imagination, defying exploration by traditional manual qualitative methods, and language technology has become increasingly data-driven and semantics-oriented. These developments present a golden opportunity for the empirical study of semantic change over both long and short time spans. A major challenge presently is to integrate the hard-earned knowledge and expertise of traditional historical linguistics with cutting-edge methodology explored primarily in computational linguistics. The idea for the present volume came out of a concrete response to this challenge. The 1st International Workshop on Computational Approaches to Historical Language Change (LChange'19), at ACL 2019, brought together scholars from both fields. This volume offers a survey of this exciting new direction in the study of semantic change, a discussion of the many remaining challenges that we face in pursuing it, and considerably updated and extended versions of a selection of the contributions to the LChange'19 workshop, addressing both more theoretical problems — e.g., discovery of "laws of semantic change" — and practical applications, such as information retrieval in longitudinal text archives.
  •  
3.
  •  
4.
  • Dubossarsky, Haim, et al. (författare)
  • Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change
  • 2019
  • Ingår i: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, July 28 - August 2, 2019 / Anna Korhonen, David Traum, Lluís Màrquez (Editors). - Stroudsburg, PA : Association for Computational Linguistics. - 9781950737482
  • Konferensbidrag (refereegranskat)abstract
    • State-of-the-art models of lexical semantic change detection suffer from noise stemming from vector space alignment. We have empirically tested the Temporal Referencing method for lexical semantic change and show that, by avoiding alignment, it is less affected by this noise. We show that, trained on a diachronic corpus, the skip-gram with negative sampling architecture with temporal referencing outperforms alignment models on a synthetic task as well as a manual testset. We introduce a principled way to simulate lexical semantic change and systematically control for possible biases.
  •  
5.
  • Duong, Quan, et al. (författare)
  • An Unsupervised method for OCR Post-Correction and Spelling Normalisation for Finnish
  • 2021
  • Ingår i: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31–2 June, 2021, Reykjavik, Iceland (online). - Linköping : Linköping Electronic Conference Proceedings. - 1650-3686 .- 1650-3740. - 9789179296148
  • Konferensbidrag (refereegranskat)abstract
    • Historical corpora are known to contain errors introduced by OCR (optical character recognition) methods used in the digitization process, often said to be degrading the performance of NLP systems. Correcting these errors manually is a time-consuming process and a great part of the automatic approaches have been relying on rules or supervised machine learning. We build on previous work on fully automatic unsupervised extraction of parallel data to train a character-based sequence-to-sequence NMT (neural machine translation) model to conduct OCR error correction designed for English, and adapt it to Finnish by proposing solutions that take the rich morphology of the language into account. Our new method shows increased performance while remaining fully unsupervised, with the added benefit of spelling normalisation. The source code and models are available on GitHub and Zenodo.
  •  
6.
  • Hengchen, Simon, 1988, et al. (författare)
  • A Collection of Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data
  • 2021
  • Ingår i: Journal of Open Humanities Data. - : Ubiquity Press, Ltd.. - 2059-481X. ; 7:2, s. 1-7
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper describes the creation of several word embedding models based on a large collection of diachronic Swedish newspaper material available through Språkbanken Text, the Swedish language bank. This data was produced in the context of Språkbanken Text’s continued mission to collaborate with humanities and natural language processing (NLP) researchers and to provide freely available language resources, for the development of state-of-the-art NLP methods and tools.
  •  
7.
  • Hengchen, Simon, 1988, et al. (författare)
  • A data-driven approach to studying changing vocabularies in historical newspaper collections
  • 2021
  • Ingår i: Digital Scholarship in the Humanities. - : Oxford University Press (OUP). - 2055-7671 .- 2055-768X. ; 36:Supplement 2, s. 109-126
  • Tidskriftsartikel (refereegranskat)abstract
    • Nation and nationhood are among the most frequently studied concepts in the field of intellectual history. At the same time, the word ‘nation’ and its historical usage are very vague. The aim in this article was to develop a data-driven method using dependency parsing and neural word embeddings to clarify some of the vagueness in the evolution of this concept. To this end, we propose the following two-step method. First, using linguistic processing, we create a large set of words pertaining to the topic of nation. Second, we train diachronic word embeddings and use them to quantify the strength of the semantic similarity between these words and thereby create meaningful clusters, which are then aligned diachronically. To illustrate the robustness of the study across languages, time spans, as well as large datasets, we apply it to the entirety of five historical newspaper archives in Dutch, Swedish, Finnish, and English. To our knowledge, thus far there have been no large-scale comparative studies of this kind that purport to grasp long-term developments in as many as four different languages in a data-driven way. A particular strength of the method we describe in this article is that, by design, it is not limited to the study of nationhood, but rather expands beyond it to other research questions and is reusable in different contexts.
  •  
8.
  • Hengchen, Simon, 1988, et al. (författare)
  • Challenges for computational lexical semantic change
  • 2021
  • Ingår i: Computational approaches to semantic change / Tahmasebi, Nina, Borin, Lars, Jatowt, Adam, Yang, Xu, Hengchen, Simon (eds.). - Berlin : Language Science Press. - 2366-7818. - 9783985540082 ; , s. 341-372
  • Bokkapitel (refereegranskat)abstract
    • The computational study of lexical semantic change (LSC) has taken off in the past few years and we are seeing increasing interest in the field, from both computational sciences and linguistics. Most of the research so far has focused on methods for modelling and detecting semantic change using large diachronic textual data, with the majority of the approaches employing neural embeddings. While methods that offer easy modelling of diachronic text are one of the main reasons for the spiking interest in LSC, neural models leave many aspects of the problem unsolved. The field has several open and complex challenges. In this chapter, we aim to describe the most important of these challenges and outline future directions.
  •  
9.
  • Hengchen, Simon, 1988, et al. (författare)
  • SBX-HY at RuShiftEval 2021: Доверяй, но проверяй : SBX-HY at RuShiftEval 2021: Doveryay, no proveryay
  • 2021
  • Ingår i: Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2021,” Moscow, June 16–19, 2021. - Moscow : Rossiiskii Gosudarstvennyi Gumanitarnyi Universitet. - 2221-7932 .- 2075-7182.
  • Konferensbidrag (refereegranskat)abstract
    • Research in computational lexical semantic change, due to the inherent nature of language change, has been notoriously difficult to evaluate. This led to the creation of many new exciting models that cannot be easily compared. In this system paper, we describe our submissions at RuShiftEval 2021 – one of the few recently shared tasks that enable researchers, through a standard evaluation set and control conditions, to systematically compare models and gain insights from previous work. We show that despite top results in similar tasks on other languages, Temporal Referencing does not seem to perform as well on Russian.
  •  
10.
  • Hengchen, Simon, 1988, et al. (författare)
  • SuperSim: a test set for word similarity and relatedness in Swedish
  • 2021
  • Ingår i: Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), May 31-June 2 2021, Reykjavik, Iceland (online). - Linköping : Linköping Electronic Conference Proceedings. - 1650-3686 .- 1650-3740. - 9789179296148
  • Konferensbidrag (refereegranskat)abstract
    • Language models are notoriously difficult to evaluate. We release SuperSim, a large-scale similarity and relatedness test set for Swedish built with expert human judgments. The test set is composed of 1,360 word-pairs independently judged for both relatedness and similarity by five annotators. We evaluate three different models (Word2Vec, fastText, and GloVe) trained on two separate Swedish datasets, namely the Swedish Gigaword corpus and a Swedish Wikipedia dump, to provide a baseline for future comparison. We release the fully annotated test set, code, baseline models, and data.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 20
Typ av publikation
konferensbidrag (8)
annan publikation (4)
tidskriftsartikel (3)
proceedings (redaktörskap) (2)
bokkapitel (2)
samlingsverk (redaktörskap) (1)
visa fler...
visa färre...
Typ av innehåll
refereegranskat (12)
övrigt vetenskapligt/konstnärligt (8)
Författare/redaktör
Hengchen, Simon, 198 ... (20)
Tahmasebi, Nina, 198 ... (13)
Dubossarsky, Haim (8)
Schlechtweg, Dominik (6)
McGillivray, Barbara (5)
Borin, Lars, 1957 (3)
visa fler...
Tolonen, Mikko (2)
Marjanen, Jani (2)
Adesam, Yvonne, 1975 (1)
Bouma, Gerlof, 1979 (1)
Forsberg, Markus, 19 ... (1)
Dannélls, Dana, 1976 (1)
Berdicevskis, Aleksa ... (1)
Morger, Felix (1)
Alfter, David, 1986 (1)
Malmsten, Martin (1)
Volodina, Elena, 197 ... (1)
Sahlgren, Magnus (1)
Kurtz, Robin (1)
Öhman, Joey (1)
Isbister, Tim (1)
Lindahl, Anna, 1988 (1)
Rekathati, Faton (1)
Börjeson, Love (1)
Cassotti, Pierluigi (1)
Periti, Francesco (1)
De Roure, David (1)
Xu, Yang (1)
Jatowt, Adam (1)
Cummings, James (1)
Duong, Quan (1)
Hämäläinen, Mika (1)
Ros, Ruben (1)
Viloria, Kate (1)
Indukaev, Andrey (1)
Palma, Marco (1)
Zosa, Elaine (1)
Pivovarova, Lidia (1)
Terras, Melissa (1)
Alex, Beatrice (1)
Ames, Sarah (1)
Armstrong, Guyda (1)
Beavan, David (1)
Ciula, Arianna (1)
Colavizza, Giovanni (1)
Farquhar, Adam (1)
Lang, Anouk (1)
Loxley, James (1)
Goudarouli, Eirini (1)
Nanni, Federico (1)
visa färre...
Lärosäte
Göteborgs universitet (20)
Uppsala universitet (1)
Språk
Engelska (20)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (19)
Humaniora (15)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy