SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Augenstein Isabelle) "

Sökning: WFRF:(Augenstein Isabelle)

  • Resultat 1-9 av 9
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Barreiro, Anabela, et al. (författare)
  • Multi3Generation : Multitask, Multilingual, Multimodal Language Generation
  • 2022
  • Ingår i: Proceedings of the 23rd Annual Conference of the European Association for Machine Translation. - : European Association for Machine Translation. ; , s. 345-346
  • Konferensbidrag (refereegranskat)abstract
    • This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action – Multi3Generatio(CA18231), an interdisciplinary networof research groups working on different aspects of language generation. This "meta-paper" will serve as reference for citationof the Action in future publications. It presents the objectives, challenges and a the links for the achieved outcomes.
  •  
2.
  • Bjerva, Johannes, et al. (författare)
  • What Do Language Representations Really Represent?
  • 2019
  • Ingår i: Computational linguistics - Association for Computational Linguistics (Print). - : MIT Press - Journals. - 0891-2017 .- 1530-9312. ; 45:2, s. 381-389
  • Tidskriftsartikel (övrigt vetenskapligt/konstnärligt)abstract
    • A neural language model trained on a text corpus can be used to induce distributed representations of words, such that similar words end up with similar representations. If the corpus is multilingual, the same model can be used to learn distributed representations of languages, such that similar languages end up with similar representations. We show that this holds even when the multilingual corpus has been translated into English, by picking up the faint signal left by the source languages. However, just as it is a thorny problem to separate semantic from syntactic similarity in word representations, it is not obvious what type of similarity is captured by language representations. We investigate correlations and causal relationships between language representations learned from translations on one hand, and genetic, geographical, and several levels of structural similarity between languages on the other. Of these, structural similarity is found to correlate most strongly with language representation similarity, whereas genetic relationships—a convenient benchmark used for evaluation in previous work—appears to be a confounding factor. Apart from implications about translation effects, we see this more generally as a case where NLP and linguistic typology can interact and benefit one another.
  •  
3.
  • Blomqvist, Eva, 1977-, et al. (författare)
  • Statistical Knowledge Patterns for Characterising Linked Data
  • 2013
  • Ingår i: Proceedings of the 4th Workshop on Ontology and Semantic Web Patterns (WOP 2013)  co-located with 12th International Semantic Web Conference (ISWC 2013). - : CEUR-WS.
  • Konferensbidrag (refereegranskat)abstract
    • Knowledge Patterns (KPs), and even more specifically Ontology Design Patterns (ODPs), are no longer only generated in a top-down fashion, rather patterns are being extracted in a bottom-up fashion from online ontologies and data sources, such as Linked Data. These KPs can assist in tasks such as making sense of datasets and formulating queries over data, including performing query expansion to manage the diversity of properties used in datasets. This paper presents an extraction method for generating what we call Statistical Knowledge Patterns (SKPs) from Linked Data. SKPs describe and characterise classes from any reference ontology, by presenting their most frequent properties and property characteristics, all based on analysis of the underlying data. SKPs are stored as small OWL ontologies but can be continuously updated in a completely automated fashion. In the paper we exemplify this method by applying it to the classes of the DBpedia ontology, and in particular we evaluate our method for extracting range axioms from data. Results show that by setting appropriate thresholds, SKPs can be generated that cover (i.e. allow us to query, using the properties of the SKP) over 94% of the triples about individuals of that class, while only needing to care about 27% of the total number of distinct properties that are used in the data.
  •  
4.
  • de Lhoneux, Miryam, 1990-, et al. (författare)
  • Parameter sharing between dependency parsers for related languages
  • 2018
  • Ingår i: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. - Brussels : Association for Computational Linguistics. - 9781948087841 ; , s. 4992-4997
  • Konferensbidrag (refereegranskat)abstract
    • Previous work has suggested that parameter sharing between transition-based neural dependency parsers for related languages can lead to better performance, but there is no consensus on what parameters to share. We present an evaluation of 27 different parameter sharing strategies across 10 languages, representing five pairs of related languages, each pair from a different language family. We find that sharing transition classifier parameters always helps, whereas the usefulness of sharing word and/or character LSTM parameters varies. Based on this result, we propose an architecture where the transition classifier is shared, and the sharing of word and character parameters is controlled by a parameter that can be tuned on validation data. This model is linguistically motivated and obtains significant improvements over a mono-lingually trained baseline. We also find that sharing transition classifier parameters helps when training a parser on unrelated language pairs, but we find that, in the case of unrelated languages, sharing too many parameters does not help.
  •  
5.
  • Kunz, Jenny, 1991- (författare)
  • Understanding Large Language Models : Towards Rigorous and Targeted Interpretability Using Probing Classifiers and Self-Rationalisation
  • 2024
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • Large language models (LLMs) have become the base of many natural language processing (NLP) systems due to their performance and easy adaptability to various tasks. However, much about their inner workings is still unknown. LLMs have many millions or billions of parameters, and large parts of their training happen in a self-supervised fashion: They simply learn to predict the next word, or missing words, in a sequence. This is effective for picking up a wide range of linguistic, factual and relational information, but it implies that it is not trivial what exactly is learned, and how it is represented within the LLM. In this thesis, I present our work on methods contributing to better understanding LLMs. The work can be grouped into two approaches. The first lies within the field of interpretability, which is concerned with understanding the internal workings of the LLMs. Specifically, we analyse and refine a tool called probing classifiers that inspects the intermediate representations of LLMs, focusing on what roles the various layers of the neural model play. This helps us to get a global understanding of how information is structured in the model. I present our work on assessing and improving the probing methodologies. We developed a framework to clarify the limitations of past methods, showing that all common controls are insufficient. Based on this, we proposed more restrictive probing setups by creating artificial distribution shifts. We developed new metrics for the evaluation of probing classifiers that move the focus from the overall information that the layer contains to differences in information content across the LLM. The second approach is concerned with explainability, specifically with self-rationalising models that generate free-text explanations along with their predictions. This is an instance of local understandability: We obtain justifications for individual predictions. In this setup, however, the generation of the explanations is just as opaque as the generation of the predictions. Therefore, our work in this field focuses on better understanding the properties of the generated explanations. We evaluate the downstream performance of a classifier with explanations generated by different model pipelines and compare it to human ratings of the explanations. Our results indicate that the properties that increase the downstream performance differ from those that humans appreciate when evaluating an explanation. Finally, we annotate explanations generated by an LLM for properties that human explanations typically have and discuss the effects those properties have on different user groups. While a detailed understanding of the inner workings of LLMs is still unfeasible, I argue that the techniques and analyses presented in this work can help to better understand LLMs, the linguistic knowledge they encode and their decision-making process. Together with knowledge about the models’ architecture, training data and training objective, such techniques can help us develop a robust high-level understanding of LLMs that can guide decisions on their deployment and potential improvements. 
  •  
6.
  • Søgaard, Anders, et al. (författare)
  • Nightmare at test time : How punctuation prevents parsers from generalizing
  • 2018
  • Ingår i: Proceedings of the 2018 EMNLP Workshop BlackboxNLP. - Brussels : Association for Computational Linguistics. ; , s. 25-29
  • Konferensbidrag (refereegranskat)abstract
    • Punctuation is a strong indicator of syntactic structure, and parsers trained on text with punctuation often rely heavily on this signal. Punctuation is a diversion, however, since human language processing does not rely on punctuation to the same extent, and in informal texts, we therefore often leave out punctuation. We also use punctuation ungrammatically for emphatic or creative purposes, or simply by mistake. We show that (a) dependency parsers are sensitive to both absence of punctuation and to alternative uses; (b) neural parsers tend to be more sensitive than vintage parsers; (c) training neural parsers without punctuation outperforms all out-of-the-box parsers across all scenarios where punctuation departs from standard punctuation. Our main experiments are on synthetically corrupted data to study the effect of punctuation in isolation and avoid potential confounds, but we also show effects on out-of-domain data.
  •  
7.
  • Zhang, Ziqi, et al. (författare)
  • An Unsupervised Data-driven Method to Discover Equivalent Relations in Large Linked Datasets
  • 2017
  • Ingår i: Semantic Web. - : IOS PRESS. - 1570-0844 .- 2210-4968. ; 8:2
  • Tidskriftsartikel (refereegranskat)abstract
    • This article addresses a number of limitations of state-of-the-art methods of Ontology Alignment: 1) they primarily address concepts and entities while relations are less well-studied; 2) many build on the assumption of the well-formedness of ontologies which is unnecessarily true in the domain of Linked Open Data; 3) few have looked at schema heterogeneity from a single source, which is also a common issue particularly in very large Linked Dataset created automatically from heterogeneous resources, or integrated from multiple datasets. We propose a domain-and language-independent and completely unsupervised method to align equivalent relations across schemata based on their shared instances. We introduce a novel similarity measure able to cope with unbalanced population of schema elements, an unsupervised technique to automatically decide similarity threshold to assert equivalence for a pair of relations, and an unsupervised clustering process to discover groups of equivalent relations across different schemata. Although the method is designed for aligning relations within a single dataset, it can also be adapted for cross-dataset alignment where sameAs links between datasets have been established. Using three gold standards created based on DBpedia, we obtain encouraging results from a thorough evaluation involving four baseline similarity measures and over 15 comparative models based on variants of the proposed method. The proposed method makes significant improvement over baseline models in terms of F1 measure (mostly between 7% and 40%), and it always scores the highest precision and is also among the top performers in terms of recall. We also make public the datasets used in this work, which we believe make the largest collection of gold standards for evaluating relation alignment in the LOD context.
  •  
8.
  • Zhang, Ziqi, et al. (författare)
  • Mining Equivalent Relations from Linked Data
  • 2013
  • Ingår i: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Volume 2. - : Association for Computational Linguistics. - 9781937284510 ; , s. 289-293
  • Konferensbidrag (refereegranskat)abstract
    • Linking heterogeneous resources is a major research challenge in the Semantic Web. This paper studies the task of mining equivalent relations from Linked Data, which was insufficiently addressed before. We introduce an unsupervised method to measure equivalency of relation pairs and cluster equivalent relations. Early experiments have shown encouraging results with an average of 0.75~0.87 precision in predicting relation pair equivalency and 0.78~0.98 precision in relation clustering.
  •  
9.
  • Zhang, Ziqi, et al. (författare)
  • Statistical Knowledge Patterns : Identifying Synonymous Relations in Large Linked Datasets
  • 2013
  • Ingår i: The Semantic Web – ISWC 2013. - Berlin, Heidelberg : Springer Berlin/Heidelberg. - 9783642413346 - 9783642413353 ; , s. 703-719
  • Konferensbidrag (refereegranskat)abstract
    • The Web of Data is a rich common resource with billions of triples available in thousands of datasets and individual Web documents created by both expert and non-expert ontologists. A common problem is the imprecision in the use of vocabularies: annotators can misunderstand the semantics of a class or property or may not be able to find the right objects to annotate with. This decreases the quality of data and may eventually hamper its usability over large scale. This paper describes Statistical Knowledge Patterns (SKP) as a means to address this issue. SKPs encapsulate key information about ontology classes, including synonymous properties in (and across) datasets, and are automatically generated based on statistical data analysis. SKPs can be effectively used to automatically normalise data, and hence increase recall in querying. Both pattern extraction and pattern usage are completely automated. The main benefits of SKPs are that: (1) their structure allows for both accurate query expansion and restriction; (2) they are context dependent, hence they describe the usage and meaning of properties in the context of a particular class; and (3) they can be generated offline, hence the equivalence among relations can be used efficiently at run time.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-9 av 9

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy