SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Ilinykh Nikolai 1994) "

Sökning: WFRF:(Ilinykh Nikolai 1994)

  • Resultat 1-10 av 20
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  •  
2.
  • Castro Ferreira, Thiago, et al. (författare)
  • The 2020 Bilingual, Bi-Directional WebNLG+ Shared Task Overview and Evaluation Results (WebNLG+ 2020)
  • 2020
  • Ingår i: Proceedings of the WebNLG+, 3rd Workshop on Natural Language Generation from the Semantic Web, Dublin 18 December 2020. - Stroudsburg : Association for Computational Linguistics. - 9781952148590
  • Konferensbidrag (refereegranskat)abstract
    • WebNLG+ offers two challenges: (i) mapping sets of RDF triples to English or Russian text (generation) and (ii) converting English or Russian text to sets of RDF triples (semantic parsing). Compared to the eponymous WebNLG challenge, WebNLG+ provides an extended dataset that enables the training, evaluation, and comparison of microplanners and semantic parsers. In this paper, we present the results of the generation and semantic parsing task for both English and Russian and provide a brief description of the participating systems.
  •  
3.
  • Dobnik, Simon, 1977, et al. (författare)
  • Towards a computational model of reference and re-reference in visual scenes
  • 2022
  • Ingår i: Proceedings of the 9th Swedish Language Technology Conference (SLTC), 23–25 November, Stockholm. - Stockholm, Sweden : The division of Speech, Music & Hearing and Språkbanken Tal, KTH Royal Institute of Technology.
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • How do we refer to scene entities in visual scenes? We explore reference and re-reference in two vision and language tasks and link them to a model of attention. We discuss our find- ings in relation to modelling situated interac- tion in grounded language models and situated dialogue systems.
  •  
4.
  • Dobnik, Simon, 1977, et al. (författare)
  • What to refer to and when? Reference and re-reference in two language-and-vision tasks
  • 2022
  • Ingår i: Proceedings of DubDial - Semdial 2022: The 26th Workshop on the Semantics and Pragmatics of Dialogue, August, 22-24, 2022, Dublin. - Dublin, Ireland : Semdial. - 2308-2275.
  • Konferensbidrag (refereegranskat)abstract
    • How do we refer to scene entities in interactive language-and-vision tasks? We explore refer- ence and re-reference in two tasks, link them to a model of attention and discuss our findings in relation to modelling situated interaction.
  •  
5.
  • Ek, Adam, 1990, et al. (författare)
  • Vector Norms as an Approximation of Syntactic Complexity
  • 2023
  • Ingår i: Proceedings of the Second Workshop on Resources and Representations for Under-Resourced Languages and Domains (Resourceful-2023), May 22, 2023 Tórshavn, Faroe Islands / editors: Nikolai Ilinykh, Felix Morger, Dana Dannélls, Simon Dobnik, Beáta Megyesi, Joakim Nivre. - Stroudsburg, PA : Association for Computational Linguistics. - 9781959429739
  • Konferensbidrag (refereegranskat)abstract
    • Internal representations in transformer models can encode useful linguistic knowledge about syntax. Such knowledge could help optimise the data annotation process. However, identifying and extracting such representations from big language models is challenging. In this paper we evaluate two multilingual transformers for the presence of knowledge about the syntactic complexity of sentences and examine different vector norms. We provide a fine-grained evaluation of different norms in different layers and for different languages. Our results suggest that no single part in the models would be the primary source for the knowledge of syntactic complexity. But some norms show a higher degree of sensitivity to syntactic complexity, depending on the language and model used.
  •  
6.
  • Ilinykh, Nikolai, 1994, et al. (författare)
  • Attention as Grounding: Exploring Textual and Cross-Modal Attention on Entities and Relations in Language-and-Vision Transformer
  • 2022
  • Ingår i: In Findings of the Association for Computational Linguistics: ACL 2022, May 22-27, 2022, Dublin, Ireland / Smaranda Muresan, Preslav Nakov, Aline Villavicencio (Editors). - Dublin, Ireland : Association for Computational Linguistics. - 9781955917254
  • Konferensbidrag (refereegranskat)abstract
    • We explore how a multi-modal transformer trained for generation of longer image descriptions learns syntactic and semantic representations about entities and relations grounded in objects at the level of masked self-attention (text generation) and cross-modal attention (information fusion). We observe that cross-attention learns the visual grounding of noun phrases into objects and high-level semantic information about spatial relations, while text-to-text attention captures low-level syntactic knowledge between words. This concludes that language models in a multi-modal task learn different semantic information about objects and relations cross-modally and uni-modally (text-only). Our code is available here: https://github.com/GU-CLASP/attention-as-grounding .
  •  
7.
  • Ilinykh, Nikolai, 1994 (författare)
  • Computational Models of Language and Vision: Studies of Neural Models as Learners of Multi-modal Knowledge
  • 2024
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • This thesis develops and evaluates computational models that generate natural language descriptions of visual content. We build and examine models of language and vision to gain a deeper understanding of how they reflect the relationship between the two modalities. This understanding is crucial for performing computational tasks. The first part of the thesis introduces three studies that inspect the role of self-attention in three different self-attention blocks of the object relation transformer model. We examine attention heatmaps to understand how the model connects different words, objects, and relations within the tasks of image captioning and image paragraph generation. We connect our interpretation of what the model learns in self-attention weights with insights from theories about human cognition, visual perception, and spatial language. The three studies in the second part of the thesis investigate how representations of images and texts can be applied and learned in task-specific models for image paragraph generation, embodied question answering, and variation in human object naming.The last two studies in the third part examine properties of human-generated texts that multi-modal models are expected to acquire in image paragraph generation as well as perceptual category description and interpretation tasks. We analyse discourse structure in image paragraphs produced with different decoding methods. We also inspect whether models of perceptual categories can abstract from visual representations and use this knowledge to generate descriptions that exhibit discriminativity levels important for the task. We show how automatic measures for evaluating text generation behave in a comparison of model-generated and human-generated image descriptions. This thesis presents several contributions. We illustrate that, under specific modelling conditions, self-attention can capture information about the relationship between objects and words. Our results emphasise that the specifics of the task determine the manner and context in which different modalities are processed, as well as the degree to which each modality contributes to the task. We demonstrate that while favoured by automatic evaluation metrics in different tasks, machine-generated image descriptions lack the discourse complexity and discriminative power that are often important for generating better, human-like image descriptions.
  •  
8.
  • Ilinykh, Nikolai, 1994, et al. (författare)
  • Context matters: evaluation of target and context features on variation of object naming
  • 2023
  • Ingår i: Proceedings of Linguistic Insights from and for Multimodal Language Processing (LIMO 2023) at KONVENS 2023, September 22, 2023, Ingolstadt, Germany. - Stroudsburg, PA : Association for Computational Linguistics (ACL). - 9798891760318
  • Konferensbidrag (refereegranskat)abstract
    • Semantic underspecification in language poses significant difficulties for models in the field of referring expression generation. This challenge becomes particularly pronounced in setups, where models need to learn from multiple modalities and their combinations. Given that different contexts require different levels of language adaptability, models face difficulties in capturing the varying degrees of specificity. To address this issue, we focus on the task of object naming and evaluate various context representations to identify the ones that enable a computational model to effectively capture human variation in object naming. Once we identify the set of useful features, we combine them in search of the optimal combination that leads to a higher correlation with humans and brings us closer to developing a standard referring expression generation model that is aware of variation in naming. The results of our study demonstrate that achieving human-like naming variation requires the model to possess extensive knowledge about the target object from multiple modalities, as well as scene-level context representations. We be- lieve that our findings contribute to the development of more sophisticated models of referring expression generation that aim to replicate human-like behaviour and performance. Our code is available at https://github.com/ GU-CLASP/object-naming-in-context.
  •  
9.
  • Ilinykh, Nikolai, 1994, et al. (författare)
  • Do Decoding Algorithms Capture Discourse Structure in Multi-Modal Tasks? A Case Study of Image Paragraph Generation
  • 2022
  • Ingår i: Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), December 7, 2022, Abu Dhabi, United Arab Emirates. - Abu Dhabi, United Arab Emirates : Association for Computational Linguistics. - 9781959429128
  • Konferensbidrag (refereegranskat)abstract
    • This paper describes insights into how different inference algorithms structure discourse in image paragraphs. We train a multi-modal transformer and compare 11 variations of decoding algorithms. We propose to evaluate image paragraphs not only with standard automatic metrics, but also with a more extensive, “under the hood” analysis of the discourse formed by sentences. Our results show that while decoding algorithms can be unfaithful to the reference texts, they still generate grounded descriptions, but they also lack understanding of the discourse structure and differ from humans in terms of attentional structure over images.
  •  
10.
  • Ilinykh, Nikolai, 1994, et al. (författare)
  • Examining the Effects of Language-and-Vision Data Augmentation for Generation of Descriptions of Human Faces
  • 2022
  • Ingår i: Proceedings of the 2nd Workshop on People in Vision, Language, and the Mind (P-VLAM 2022) at LREC 2022, June 2022, Marseille, France / Patrizia Paggio, Albert Gatt, Marc Tanti (Editors). - Marseille, France : European Language Resources Association (ELRA). - 9791095546795
  • Konferensbidrag (refereegranskat)abstract
    • We investigate how different augmentation techniques on both textual and visual representations affect the performance of the face description generation model. Specifically, we provide the model with either original images, sketches of faces, facial composites or distorted images. In addition, on the language side, we experiment with different methods to augment the original dataset with paraphrased captions, which are semantically equivalent to the original ones, but differ in terms of their form. We also examine if augmenting the dataset with descriptions from a different domain (e.g., image captions of real-world images) has an effect on the performance of the models. We train models on different combinations of visual and linguistic features and perform both (i) automatic evaluation of generated captions and (ii) examination of how useful different visual features are for the task of facial feature classification. Our results show that although original images encode the best possible representation for the task, the model trained on sketches can still perform relatively well. We also observe that augmenting the dataset with descriptions from a different domain can boost performance of the model. We conclude that face description generation systems are more susceptible to language rather than vision data augmentation. Overall, we demonstrate that face caption generation models display a strong imbalance in the utilisation of language and vision modalities, indicating a lack of proper information fusion. We also describe ethical implications of our study and argue that future work on human face description generation should create better, more representative datasets.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 20

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy