SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Skantze Gabriel 1975 ) "

Search: WFRF:(Skantze Gabriel 1975 )

  • Result 1-50 of 62
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Ahlberg, Sofie, et al. (author)
  • Co-adaptive Human-Robot Cooperation : Summary and Challenges
  • 2022
  • In: Unmanned Systems. - : World Scientific Pub Co Pte Ltd. - 2301-3850 .- 2301-3869. ; 10:02, s. 187-203
  • Journal article (peer-reviewed)abstract
    • The work presented here is a culmination of developments within the Swedish project COIN: Co-adaptive human-robot interactive systems, funded by the Swedish Foundation for Strategic Research (SSF), which addresses a unified framework for co-adaptive methodologies in human-robot co-existence. We investigate co-adaptation in the context of safe planning/control, trust, and multi-modal human-robot interactions, and present novel methods that allow humans and robots to adapt to one another and discuss directions for future work.
  •  
2.
  • Ashkenazi, Shaul, et al. (author)
  • Goes to the Heart: Speaking the User's Native Language
  • 2024
  • In: HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. - : Association for Computing Machinery (ACM). ; , s. 214-218
  • Conference paper (peer-reviewed)abstract
    • We are developing a social robot to work alongside human support workers who help new arrivals in a country to navigate the necessary bureaucratic processes in that country. The ultimate goal is to develop a robot that can support refugees and asylum seekers in the UK. As a first step, we are targeting a less vulnerable population with similar support needs: international students in the University of Glasgow. As the target users are in a new country and may be in a state of stress when they seek support, forcing them to communicate in a foreign language will only fuel their anxiety, so a crucial aspect of the robot design is that it should speak the users' native language if at all possible. We provide a technical description of the robot hardware and software, and describe the user study that will shortly be carried out. At the end, we explain how we are engaging with refugee support organisations to extend the robot into one that can also support refugees and asylum seekers.
  •  
3.
  • Axelsson, Agnes, 1992- (author)
  • Adaptive Robot Presenters : Modelling Grounding in Multimodal Interaction
  • 2023
  • Doctoral thesis (other academic/artistic)abstract
    • This thesis addresses the topic of grounding in human-robot interaction, that is, the process by which the human and robot can ensure mutual understanding. To explore this topic, the scenario of a robot holding a presentation to a human audience is used, where the robot has to process multimodal feedback from the human in order to adapt the presentation to the human's level of understanding.First, the use of behaviour trees to model real-time interactive processes of the presentation is addressed. A system based on the behaviour tree architecture is used in a semi-automated Wizard-of-oz experiment, showing that audience members prefer an adaptive system to a non-adaptive alternative.Next, the thesis addresses the use of knowledge graphs to represent the content of the presentation given by the robot. By building a small, local knowledge graph containing properties (edges) that represent facts about the presentation, the system can iterate over that graph and consistently find ways to refer to entities by referring to previously grounded content. A system based on this architecture is implemented, and an evaluation using simulated users is presented. The results show that crowdworkers comparing different adaptation strategies are sensitive to the types of adaptation enabled by the knowledge graph approach.In a face-to-face presentation setting, feedback from the audience can potentially be expressed through various modalities, including speech, head movements, gaze, facial gestures and body pose. The thesis explores how such feedback can be automatically classified. A corpus of human-robot interactions is annotated, and models are trained to classify human feedback as positive, negative or neutral. A relatively high accuracy is achieved by training simple classifiers with signals found mainly in the speech and head movements.When knowledge graphs are used as the underlying representation of the system's presentation, some consistent way of generating text, that can be turned into speech, is required. This graph-to-text problem is explored by proposing several methods, both template-based and methods based on zero-shot generation using large language models (LLMs). A novel evaluation method using a combination of factual, counter-factual and fictional graphs is proposed. Finally, the thesis presents and evaluates a fully automated system using all of the components above. The results show that audience members prefer the adaptive system to a non-adaptive system, matching the results from the beginning of the thesis. However, we note that clear learning results are not found, which means that the entertainment aspects of the presentation are perhaps more prominent than the learning aspects.
  •  
4.
  • Axelsson, Agnes, 1992-, et al. (author)
  • Do you follow? : A fully automated system for adaptive robot presenters
  • 2023
  • In: HRI 2023. - New York, NY, USA : Association for Computing Machinery (ACM). ; , s. 102-111
  • Conference paper (peer-reviewed)abstract
    • An interesting application for social robots is to act as a presenter, for example as a museum guide. In this paper, we present a fully automated system architecture for building adaptive presentations for embodied agents. The presentation is generated from a knowledge graph, which is also used to track the grounding state of information, based on multimodal feedback from the user. We introduce a novel way to use large-scale language models (GPT-3 in our case) to lexicalise arbitrary knowledge graph triples, greatly simplifying the design of this aspect of the system. We also present an evaluation where 43 participants interacted with the system. The results show that users prefer the adaptive system and consider it more human-like and flexible than a static version of the same system, but only partial results are seen in their learning of the facts presented by the robot.
  •  
5.
  • Axelsson, Agnes, 1992-, et al. (author)
  • Modeling Feedback in Interaction With Conversational Agents—A Review
  • 2022
  • In: Frontiers in Computer Science. - : Frontiers Media SA. - 2624-9898. ; 4
  • Research review (peer-reviewed)abstract
    • Intelligent agents interacting with humans through conversation (such as a robot, embodied conversational agent, or chatbot) need to receive feedback from the human to make sure that its communicative acts have the intended consequences. At the same time, the human interacting with the agent will also seek feedback, in order to ensure that her communicative acts have the intended consequences. In this review article, we give an overview of past and current research on how intelligent agents should be able to both give meaningful feedback toward humans, as well as understanding feedback given by the users. The review covers feedback across different modalities (e.g., speech, head gestures, gaze, and facial expression), different forms of feedback (e.g., backchannels, clarification requests), and models for allowing the agent to assess the user's level of understanding and adapt its behavior accordingly. Finally, we analyse some shortcomings of current approaches to modeling feedback, and identify important directions for future research.
  •  
6.
  • Axelsson, Agnes, 1992-, et al. (author)
  • Multimodal User Feedback During Adaptive Robot-Human Presentations
  • 2022
  • In: Frontiers in Computer Science. - : Frontiers Media SA. - 2624-9898. ; 3
  • Journal article (peer-reviewed)abstract
    • Feedback is an essential part of all communication, and agents communicating with humans must be able to both give and receive feedback in order to ensure mutual understanding. In this paper, we analyse multimodal feedback given by humans towards a robot that is presenting a piece of art in a shared environment, similar to a museum setting. The data analysed contains both video and audio recordings of 28 participants, and the data has been richly annotated both in terms of multimodal cues (speech, gaze, head gestures, facial expressions, and body pose), as well as the polarity of any feedback (negative, positive, or neutral). We train statistical and machine learning models on the dataset, and find that random forest models and multinomial regression models perform well on predicting the polarity of the participants' reactions. An analysis of the different modalities shows that most information is found in the participants' speech and head gestures, while much less information is found in their facial expressions, body pose and gaze. An analysis of the timing of the feedback shows that most feedback is given when the robot makes pauses (and thereby invites feedback), but that the more exact timing of the feedback does not affect its meaning.
  •  
7.
  • Axelsson, Agnes, 1992-, et al. (author)
  • Robots in autonomous buses: Who hosts when no human is there?
  • 2024
  • In: HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. - : Association for Computing Machinery (ACM). ; , s. 1278-1280
  • Conference paper (peer-reviewed)abstract
    • In mid-2023, we performed an experiment in autonomous buses in Stockholm, Sweden, to evaluate the role that social robots might have in such settings, and their effects on passengers' feeling of safety and security, given the absence of human drivers or clerks. To address the situations that may occur in autonomous public transit (APT), we compared an embodied agent to a disembodied agent. In this video publication, we showcase some of the things that worked with the interactions we created, and some problematic issues that we had not anticipated.
  •  
8.
  • Axelsson, Agnes, 1992-, et al. (author)
  • Using Large Language Models for Zero-Shot Natural Language Generation from Knowledge Graphs
  • 2023
  • In: Proceedings of the Workshop on Multimodal, Multilingual Natural Language Generation and Multilingual WebNLG Challenge (MM-NLG 2023). - : Association for Computational Linguistics (ACL). ; , s. 39-54
  • Conference paper (peer-reviewed)abstract
    • In any system that uses structured knowledgegraph (KG) data as its underlying knowledge representation, KG-to-text generation is a useful tool for turning parts of the graph data into text that can be understood by humans. Recent work has shown that models that make use of pretraining on large amounts of text data can perform well on the KG-to-text task, even with relatively little training data on the specific graph-to-text task. In this paper, we build on this concept by using large language models to perform zero-shot generation based on nothing but the model’s understanding of the triple structure from what it can read. We show that ChatGPT achieves near state-of-the-art performance on some measures of the WebNLG 2020 challenge, but falls behind on others. Additionally, we compare factual, counter-factual and fictional statements, and show that there is a significant connection between what the LLM already knows about the data it is parsing and the quality of the output text.
  •  
9.
  • Axelsson, Nils, 1992-, et al. (author)
  • Modelling Adaptive Presentations in Human-Robot Interaction using Behaviour Trees
  • 2019
  • In: 20th Annual Meeting of the Special Interest Group on Discourse and Dialogue. - Stroudsburg, PA : Association for Computational Linguistics (ACL). ; , s. 345-352
  • Conference paper (peer-reviewed)abstract
    • In dialogue, speakers continuously adapt their speech to accommodate the listener, based on the feedback they receive. In this paper, we explore the modelling of such behaviours in the context of a robot presenting a painting. A Behaviour Tree is used to organise the behaviour on different levels, and allow the robot to adapt its behaviour in real-time; the tree organises engagement, joint attention, turn-taking, feedback and incremental speech processing. An initial implementation of the model is presented, and the system is evaluated in a user study, where the adaptive robot presenter is compared to a non-adaptive version. The adaptive version is found to be more engaging by the users, although no effects are found on the retention of the presented material.
  •  
10.
  • Axelsson, Nils, 1992-, et al. (author)
  • Using knowledge graphs and behaviour trees for feedback-aware presentation agents
  • 2020
  • In: Proceedings of Intelligent Virtual Agents 2020. - New York, NY, USA : Association for Computing Machinery (ACM).
  • Conference paper (peer-reviewed)abstract
    • In this paper, we address the problem of how an interactive agent (such as a robot) can present information to an audience and adaptthe presentation according to the feedback it receives. We extend a previous behaviour tree-based model to generate the presentation from a knowledge graph (Wikidata), which allows the agent to handle feedback incrementally, and adapt accordingly. Our main contribution is using this knowledge graph not just for generating the system’s dialogue, but also as the structure through which short-term user modelling happens. In an experiment using simulated users and third-party observers, we show that referring expressions generated by the system are rated more highly when they adapt to the type of feedback given by the user, and when they are based on previously grounded information as opposed to new information.
  •  
11.
  • Aylett, Matthew Peter, et al. (author)
  • Why is my Agent so Slow? Deploying Human-Like Conversational Turn-Taking
  • 2023
  • In: HAI 2023 - Proceedings of the 11th Conference on Human-Agent Interaction. - : Association for Computing Machinery (ACM). ; , s. 490-492
  • Conference paper (peer-reviewed)abstract
    • The emphasis on one-to-one speak/wait spoken conversational interaction with intelligent agents leads to long pauses between conversational turns, undermines the flow and naturalness of the interaction, and undermines the user experience. Despite ground breaking advances in the area of generating and understanding natural language with techniques such as LLMs, conversational interaction has remained relatively overlooked. In this workshop we will discuss and review the challenges, recent work and potential impact of improving conversational interaction with artificial systems. We hope to share experiences of poor human/system interaction, best practices with third party tools, and generate design guidance for the community.
  •  
12.
  • Blomsma, Peter, et al. (author)
  • Backchannel Behavior Influences the Perceived Personality of Human and Artificial Communication Partners
  • 2022
  • In: Frontiers in Artificial Intelligence. - : Frontiers Media SA. - 2624-8212. ; 5
  • Journal article (peer-reviewed)abstract
    • Different applications or contexts may require different settings for a conversational AI system, as it is clear that e.g., a child-oriented system would need a different interaction style than a warning system used in emergency situations. The current article focuses on the extent to which a system's usability may benefit from variation in the personality it displays. To this end, we investigate whether variation in personality is signaled by differences in specific audiovisual feedback behavior, with a specific focus on embodied conversational agents. This article reports about two rating experiments in which participants judged the personalities (i) of human beings and (ii) of embodied conversational agents, where we were specifically interested in the role of variability in audiovisual cues. Our results show that personality perceptions of both humans and artificial communication partners are indeed influenced by the type of feedback behavior used. This knowledge could inform developers of conversational AI on how to also include personality in their feedback behavior generation algorithms, which could enhance the perceived personality and in turn generate a stronger sense of presence for the human interlocutor.
  •  
13.
  • Borg, Alexander, et al. (author)
  • Creating Virtual Patients using Robots and Large Language Models: A Preliminary Study with Medical Students
  • 2024
  • In: HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction. - : Association for Computing Machinery (ACM). ; , s. 273-277
  • Conference paper (peer-reviewed)abstract
    • This paper presents a virtual patient (VP) platform for medical education, combining a social robot, Furhat, with large language models (LLMs). Aimed at enhancing clinical reasoning (CR) training, particularly in rheumatology, this approach introduces more interactive and realistic patient simulations. The use of LLMs both for driving the dialogue, but also for the expression of emotions in the robot's face, as well as automatic analysis and generation of feedback to the student, is discussed. The platform's effectiveness was tested in a pilot study with 15 medical students, comparing it against a traditional semi-linear VP platform. The evaluation indicates a preference for the robot platform in terms of authenticity and learning effect. We conclude that this novel integration of a social robot and LLMs in VP simulations shows potential in medical education, offering a more engaging learning experience.
  •  
14.
  • Dogruoz, A. Seza, et al. (author)
  • How "open" are the conversations with open-domain chatbots? : A proposal for Speech Event based evaluation
  • 2021
  • In: SIGDIAL 2021. - : ASSOC COMPUTATIONAL LINGUISTICS. ; , s. 392-402
  • Conference paper (peer-reviewed)abstract
    • Open-domain chatbots are supposed to converse freely with humans without being restricted to a topic, task or domain. However, the boundaries and/or contents of open-domain conversations are not clear. To clarify the boundaries of "openness", we conduct two studies: First, we classify the types of "speech events" encountered in a chatbot evaluation data set (i.e., Meena by Google) and find that these conversations mainly cover the "small talk" category and exclude the other speech event categories encountered in real life human-human communication. Second, we conduct a small-scale pilot study to generate online conversations covering a wider range of speech event categories between two humans vs. a human and a state-of-the-art chatbot (i.e., Blender by Facebook). A human evaluation of these generated conversations indicates a preference for human-human conversations, since the human-chatbot conversations lack coherence in most speech event categories. Based on these results, we suggest (a) using the term "small talk" instead of "opendomain" for the current chatbots which are not that "open" in terms of conversational abilities yet, and (b) revising the evaluation methods to test the chatbot conversations against other speech events.
  •  
15.
  • Ekstedt, Erik, et al. (author)
  • Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis
  • 2023
  • In: Interspeech 2023. - : International Speech Communication Association. ; , s. 5481-5485
  • Conference paper (peer-reviewed)abstract
    • Turn-taking is a fundamental aspect of human communication where speakers convey their intention to either hold, or yield, their turn through prosodic cues. Using the recently proposed Voice Activity Projection model, we propose an automatic evaluation approach to measure these aspects for conversational speech synthesis. We investigate the ability of three commercial, and two open-source, Text-To-Speech (TTS) systems ability to generate turn-taking cues over simulated turns. By varying the stimuli, or controlling the prosody, we analyze the models performances. We show that while commercial TTS largely provide appropriate cues, they often produce ambiguous signals, and that further improvements are possible. TTS, trained on read or spontaneous speech, produce strong turn-hold but weak turn-yield cues. We argue that this approach, that focus on functional aspects of interaction, provides a useful addition to other important speech metrics, such as intelligibility and naturalness.
  •  
16.
  • Ekstedt, Erik, et al. (author)
  • How Much Does Prosody Help Turn-taking?Investigations using Voice Activity Projection Models
  • 2022
  • In: Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue. - Edinburgh UK : Association for Computational Linguistics. ; , s. 541-551
  • Conference paper (peer-reviewed)abstract
    • Turn-taking is a fundamental aspect of human communication and can be described as the ability to take turns, project upcoming turn shifts, and supply backchannels at appropriate locations throughout a conversation. In this work, we investigate the role of prosody in turn-taking using the recently proposed Voice Activity Projection model, which incrementally models the upcoming speech activity of the interlocutors in a self-supervised manner, without relying on explicit annotation of turn-taking events, or the explicit modeling of prosodic features. Through manipulation of the speech signal, we investigate how these models implicitly utilize prosodic information. We show that these systems learn to utilize various prosodic aspects of speech both on aggregate quantitative metrics of long-form conversations and on single utterances specifically designed to depend on prosody.
  •  
17.
  • Ekstedt, Erik (author)
  • Predictive Modeling of Turn-Taking in Spoken Dialogue : Computational Approaches for the Analysis of Turn-Taking in Humans and Spoken Dialogue Systems
  • 2023
  • Doctoral thesis (other academic/artistic)abstract
    • Turn-taking in spoken dialogue represents a complex cooperative process wherein participants use verbal and non-verbal cues to coordinate who speaks and who listens, to anticipate speaker transitions, and to produce backchannels (e.g., “mhm”, “uh-huh”) at the right places. This thesis frames turntaking as the modeling of voice activity dynamics of dialogue interlocutors, with a focus on predictive modeling of these dynamics using both text- and audio-based deep learning models. Crucially, the models operate incrementally, estimating the activity dynamics across all potential dialogue states and interlocutors throughout a conversation. The aim is for these models is to increase the responsiveness of Spoken Dialogue Systems (SDS) while minimizing interruption. However, a considerable focus is also put on the analytical capabilities of these models to serve as data-driven, model-based tools for analyzing human conversational patterns in general.This thesis focuses on the development and analysis of two distinct models of turn-taking: TurnGPT, operating in the verbal domain, and the Voice Activity Projection (VAP) model in the acoustic domain. Trained with general prediction objectives, these models offer versatility beyond turn-taking, enabling novel analyses of spoken dialogue. Utilizing attention and gradientbased techniques, this thesis sheds light on the crucial role of context in estimating speaker transitions within the verbal domain. The potential of incorporating TurnGPT into SDSs – employing a sampling-based strategy to predict upcoming speaker transitions from incomplete text, namely words yet to be transcribed by the ASR – is investigated to enhance system responsiveness. The VAP model, which predicts the joint voice activity of both dialogue interlocutors, is introduced and adapted to handle stereo channel audio. The model’s prosodic sensitivity is examined both in targeted utterances and in extended spoken dialogues. This analysis reveals that while intonation is crucial for distinguishing syntactically ambiguous events, it plays a less important role in general turn-taking within long-form dialogues. The VAP model’s analytical capabilities are also highlighted, to assess the impact of filled pauses and serve as an evaluation tool for conversational TTS, determining their ability to produce prosodically relevant turn-taking cues.
  •  
18.
  • Ekstedt, Erik, et al. (author)
  • Projection of Turn Completion in Incremental Spoken Dialogue Systems
  • 2021
  • In: SIGDIAL 2021. - : ASSOC COMPUTATIONAL LINGUISTICS. ; , s. 431-437
  • Conference paper (peer-reviewed)abstract
    • The ability to take turns in a fluent way (i.e., without long response delays or frequent interruptions) is a fundamental aspect of any spoken dialog system. However, practical speech recognition services typically induce a long response delay, as it takes time before the processing of the user's utterance is complete. There is a considerable amount of research indicating that humans achieve fast response times by projecting what the interlocutor will say and estimating upcoming turn completions. In this work, we implement this mechanism in an incremental spoken dialog system, by using a language model that generates possible futures to project upcoming completion points. In theory, this could make the system more responsive, while still having access to semantic information not yet processed by the speech recognizer. We conduct a small study which indicates that this is a viable approach for practical dialog systems, and that this is a promising direction for future research.
  •  
19.
  • Ekstedt, Erik, et al. (author)
  • Show & Tell : Voice Activity Projection and Turn-taking
  • 2023
  • In: Interspeech 2023. - : International Speech Communication Association. ; , s. 2020-2021
  • Conference paper (peer-reviewed)abstract
    • We present Voice Activity Projection (VAP), a model trained on spontaneous spoken dialog with the objective to incrementally predict future voice activity. Similar to a language model, it is trained through self-supervised learning and outputs a probability distribution over discrete states that corresponds to the joint future voice activity of the dialog interlocutors. The model is well-defined over overlapping speech regions, resilient towards microphone “bleed-over” and considers the speech of both speakers (e.g., a user and an agent) to provide the most likely next speaker. VAP is a general turn-taking model which can serve as the base for turn-taking decisions in spoken dialog systems, an automatic tool useful for linguistics and conversational analysis, an automatic evaluation metric for conversational text-to-speech models, and possibly many other tasks related to spoken dialog interaction.
  •  
20.
  • Ekstedt, Erik, et al. (author)
  • TurnGPT : a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog
  • 2020
  • In: Findings of the Association for Computational Linguistics. - Online : Association for Computational Linguistics (ACL). ; , s. 2981-2990
  • Conference paper (peer-reviewed)abstract
    • Syntactic and pragmatic completeness is known to be important for turn-taking prediction, but so far machine learning models of turn-taking have used such linguistic information in a limited way. In this paper, we introduce TurnGPT, a transformer-based language model for predicting turn-shifts in spoken dialog. The model has been trained and evaluated on a variety of written and spoken dialog datasets. We show that the model outperforms two baselines used in prior work. We also report on an ablation study, as well as attention and gradient analyses, which show that the model is able to utilize the dialog context and pragmatic completeness for turn-taking prediction. Finally, we explore the model’s potential in not only detecting, but also projecting, turn-completions.
  •  
21.
  • Ekstedt, Erik, et al. (author)
  • Voice Activity Projection: Self-supervised Learning of Turn-taking Events
  • 2022
  • In: INTERSPEECH 2022. - : International Speech Communication Association. ; , s. 5190-5194
  • Conference paper (peer-reviewed)abstract
    • The modeling of turn-taking in dialog can be viewed as the modeling of the dynamics of voice activity of the interlocutors. We extend prior work and define the predictive task of Voice Activity Projection, a general, self-supervised objective, as a way to train turn-taking models without the need of labeled data. We highlight a theoretical weakness with prior approaches, arguing for the need of modeling the dependency of voice activity events in the projection window. We propose four zero-shot tasks, related to the prediction of upcoming turn-shifts and backchannels, and show that the proposed model outperforms prior work.
  •  
22.
  • Elgarf, Maha, et al. (author)
  • CreativeBot : a Creative Storyteller robot to stimulate creativity in children
  • 2022
  • In: ICMI '22: Proceedings of the 2022 International Conference on Multimodal Interaction. - New York, NY, USA : Association for Computing Machinery. ; , s. 540-548
  • Conference paper (peer-reviewed)abstract
    • We present the design and evaluation of a storytelling activity between children and an autonomous robot aiming at nurturing children's creativity. We assessed whether a robot displaying creative behavior will positively impact children's creativity skills in a storytelling context. We developed two models for the robot to engage in the storytelling activity: creative model, where the robot generates creative story ideas, and the non-creative model, where the robot generates non-creative story ideas. We also investigated whether the type of the storytelling interaction will have an impact on children's creativity skills. We used two types of interaction: 1) Collaborative, where the child and the robot collaborate together by taking turns to tell a story. 2) Non-collaborative: where the robot first tells a story to the child and then asks the child to tell it another story. We conducted a between-subjects study with 103 children in four different conditions: Creative collaborative, Non-creative collaborative, Creative non-collaborative and Non-Creative non-collaborative. The children's stories were evaluated according to the four standard creativity variables: fluency, flexibility, elaboration and originality. Results emphasized that children who interacted with a creative robot showed higher creativity during the interaction than children who interacted with a non-creative robot. Nevertheless, no significant effect of the type of the interaction was found on children's creativity skills. Our findings are significant to the Child-Robot interaction (cHRI) community since they enrich the scientific understanding of the development of child-robot encounters for educational applications.
  •  
23.
  • Elgarf, Maha, et al. (author)
  • Once Upon a Story : Can a Creative Storyteller Robot Stimulate Creativity in Children?
  • 2021
  • In: Proceedings of the 21st ACM international conference on intelligent virtual agents (IVA). - New York, NY, USA : Association for Computing Machinery (ACM). ; , s. 60-67
  • Conference paper (peer-reviewed)abstract
    • Creativity is a vital inherent human trait. In an attempt to stimulate children's creativity, we present the design and evaluation of an interaction between a child and a social robot in a storytelling context. Using a software interface, children were asked to collaboratively create a story with the robot. We conducted a study with 38 children in two conditions. In one condition, the children interacted with a robot exhibiting creative behavior while in the other condition, they interacted with a robot exhibiting non creative behavior. The robot's creativity was defined as verbal and performance creativity. The robot's creative and non creative behaviors were extracted from a previously collected data set and were validated in an online survey with 100 participants. Contrary to our initial hypothesis, children's creativity measures were not higher in the creative condition than in the non creative condition. Our results suggest that merely the robot's creative behavior is insufficient to stimulate creativity in children in a child robot interaction. We further discuss other design factors that may facilitate sparking creativity in children in similar settings in the future.
  •  
24.
  • Figueroa, Carol, et al. (author)
  • Annotation of Communicative Functions of Short Feedback Tokens in Switchboard
  • 2022
  • In: <em>2022 Language Resources and Evaluation Conference, LREC 2022</em>.
  • Conference paper (peer-reviewed)abstract
    • There has been a lot of work on predicting the timing of feedback in conversational systems. However, there has been less focus on predicting the prosody and lexical form of feedback given their communicative function. Therefore, in this paper we present our preliminary annotations of the communicative functions of 1627 short feedback tokens from the Switchboard corpus and an analysis of their lexical realizations and prosodic characteristics. Since there is no standard scheme for annotating the communicative function of feedback we propose our own annotation scheme. Although our work is ongoing, our preliminary analysis revealed lexical tokens such as yeah are ambiguous and therefore lexical forms alone are not indicative of the function. Both the lexical form and prosodic characteristics need to be taken into account in order to predict the communicative function. We also found that feedback functions have distinguishable prosodic characteristics in terms of duration, mean pitch, pitch slope, and pitch range. 
  •  
25.
  • Figueroa, Carol, et al. (author)
  • Classification of Feedback Functions in Spoken Dialog Using Large Language Models and Prosodic Features
  • 2023
  • In: 27th Workshop on the Semantics and Pragmatics of Dialogue. - Maribor : University of Maribor. ; , s. 15-24
  • Conference paper (peer-reviewed)abstract
    • Feedback utterances such as ‘yeah’, ‘mhm’,and ‘okay’, convey different communicative functions depending on their prosodic realizations, as well as the conversational context in which they are produced. In this paper, we investigate the performance of different models and features for classifying the communicative function of short feedback tokens in American English dialog. We experiment with a combination of lexical and prosodic features extracted from the feedback utterance, as well as context features from the preceding utterance of the interlocutor. Given the limited amount of training data, we explore the use of a pre-trained large language model (GPT-3) to encode contextual information, as well as SimCSE sentence embeddings. The results show that good performance can be achieved with only SimCSE and lexical features, while the best performance is achieved by solely fine-tuning GPT-3, even if it does not have access to any prosodic features.
  •  
26.
  •  
27.
  • Förster, Frank, et al. (author)
  • Working with troubles and failures in conversation between humans and robots: workshop report
  • 2023
  • In: Frontiers in Robotics and AI. - : Frontiers Media SA. - 2296-9144. ; 10
  • Journal article (peer-reviewed)abstract
    • This paper summarizes the structure and findings from the first Workshop on Troubles and Failures in Conversations between Humans and Robots. The workshop was organized to bring together a small, interdisciplinary group of researchers working on miscommunication from two complementary perspectives. One group of technology-oriented researchers was made up of roboticists, Human-Robot Interaction (HRI) researchers and dialogue system experts. The second group involved experts from conversation analysis, cognitive science, and linguistics. Uniting both groups of researchers is the belief that communication failures between humans and machines need to be taken seriously and that a systematic analysis of such failures may open fruitful avenues in research beyond current practices to improve such systems, including both speech-centric and multimodal interfaces. This workshop represents a starting point for this endeavour. The aim of the workshop was threefold: Firstly, to establish an interdisciplinary network of researchers that share a common interest in investigating communicative failures with a particular view towards robotic speech interfaces; secondly, to gain a partial overview of the “failure landscape” as experienced by roboticists and HRI researchers; and thirdly, to determine the potential for creating a robotic benchmark scenario for testing future speech interfaces with respect to the identified failures. The present article summarizes both the “failure landscape” surveyed during the workshop as well as the outcomes of the attempt to define a benchmark scenario.
  •  
28.
  • Ibrahim, Omnia, et al. (author)
  • Fundamental frequency accommodation in multi-party human-robot game interactions : The effect of winning or losing
  • 2019
  • In: Proceedings Interspeech 2019. - : International Speech Communication Association. ; , s. 3980-3984
  • Conference paper (peer-reviewed)abstract
    • In human-human interactions, the situational context plays a large role in the degree of speakers’ accommodation. In this paper, we investigate whether the degree of accommodation in a human-robot computer game is affected by (a) the duration of the interaction and (b) the success of the players in the game. 30 teams of two players played two card games with a conversational robot in which they had to find a correct order of five cards. After game 1, the players received the result of the game on a success scale from 1 (lowest success) to 5 (highest). Speakers’ fo accommodation was measured as the Euclidean distance between the human speakers and each human and the robot. Results revealed that (a) the duration of the game had no influence on the degree of fo accommodation and (b) the result of Game 1 correlated with the degree of fo accommodation in Game 2 (higher success equals lower Euclidean distance). We argue that game success is most likely considered as a sign of the success of players’ cooperation during the discussion, which leads to a higher accommodation behavior in speech.
  •  
29.
  • Ibrahim, Omnia, et al. (author)
  • Revisiting robot directed speech effects in spontaneous Human-Human-Robot interactions
  • 2021
  • Conference paper (peer-reviewed)abstract
    • In this paper, we investigate the differences betweenhuman-directed speech and robot-directed speech duringspontaneous human-human-robot interactions. Theinteractions under study are different from previousstudies, in the sense that the robot has a more similarrole as the human interlocutors, which leads to morespontaneous turn-taking. 20 conversations were extractedfrom a multi-party human-robot discussioncorpus, where two humans are playing a collaborativecard game with a social robot. Each utterance inthe conversations was manually labeled according toaddressee (robot or human). The following acousticfeatures were extracted: fundamental frequency, intensity,speaking rate, and total utterance duration.There were significant differences between humanandrobot-directed speech for speaking rate and thetotal utterance duration. These results are in linewith previous studies on robot-directed speech, andconfirms that this difference holds also when the conversationsare of a more spontaneous nature.
  •  
30.
  • Inoue, Koji, et al. (author)
  • Multilingual Turn-taking Prediction Using Voice Activity Projection
  • 2024
  • In: 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. - : European Language Resources Association (ELRA). ; , s. 11873-11883
  • Conference paper (peer-reviewed)abstract
    • This paper investigates the application of voice activity projection (VAP), a predictive turn-taking model for spoken dialogue, on multilingual data, encompassing English, Mandarin, and Japanese. The VAP model continuously predicts the upcoming voice activities of participants in dyadic dialogue, leveraging a cross-attention Transformer to capture the dynamic interplay between participants. The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages. However, a multilingual model, trained on all three languages, demonstrates predictive performance on par with monolingual models across all languages. Further analyses show that the multilingual model has learned to discern the language of the input signal. We also analyze the sensitivity to pitch, a prosodic cue that is thought to be important for turn-taking. Finally, we compare two different audio encoders, contrastive predictive coding (CPC) pre-trained on English, with a recent model based on multilingual wav2vec 2.0 (MMS).
  •  
31.
  • Inoue, Koji, et al. (author)
  • Towards Objective Evaluation of Socially-Situated Conversational Robots : Assessing Human-Likeness through Multimodal User Behaviors
  • 2023
  • In: ICMI 2023 Companion. - : Association for Computing Machinery (ACM). ; , s. 86-90
  • Conference paper (peer-reviewed)abstract
    • This paper tackles the challenging task of evaluating socially situated conversational robots and presents a novel objective evaluation approach that relies on multimodal user behaviors. In this study, our main focus is on assessing the human-likeness of the robot as the primary evaluation metric. While previous research often relied on subjective evaluations from users, our approach aims to evaluate the robot's human-likeness based on observable user behaviors indirectly, thus enhancing objectivity and reproducibility. To begin, we created an annotated dataset of human-likeness scores, utilizing user behaviors found in an attentive listening dialogue corpus. We then conducted an analysis to determine the correlation between multimodal user behaviors and human-likeness scores, demonstrating the feasibility of our proposed behavior-based evaluation method.
  •  
32.
  • Irfan, Bahar, et al. (author)
  • Recommendations for designing conversational companion robots with older adults through foundation models
  • 2024
  • In: Frontiers in Robotics and AI. - : Frontiers Media SA. - 2296-9144. ; 11
  • Journal article (peer-reviewed)abstract
    • Companion robots are aimed to mitigate loneliness and social isolation among older adults by providing social and emotional support in their everyday lives. However, older adults’ expectations of conversational companionship might substantially differ from what current technologies can achieve, as well as from other age groups like young adults. Thus, it is crucial to involve older adults in the development of conversational companion robots to ensure that these devices align with their unique expectations and experiences. The recent advancement in foundation models, such as large language models, has taken a significant stride toward fulfilling those expectations, in contrast to the prior literature that relied on humans controlling robots (i.e., Wizard of Oz) or limited rule-based architectures that are not feasible to apply in the daily lives of older adults. Consequently, we conducted a participatory design (co-design) study with 28 older adults, demonstrating a companion robot using a large language model (LLM), and design scenarios that represent situations from everyday life. The thematic analysis of the discussions around these scenarios shows that older adults expect a conversational companion robot to engage in conversation actively in isolation and passively in social settings, remember previous conversations and personalize, protect privacy and provide control over learned data, give information and daily reminders, foster social skills and connections, and express empathy and emotions. Based on these findings, this article provides actionable recommendations for designing conversational companion robots for older adults with foundation models, such as LLMs and vision-language models, which can also be applied to conversational robots in other domains.
  •  
33.
  •  
34.
  • Jiang, Binger, et al. (author)
  • Response-conditioned Turn-taking Prediction
  • 2023
  • In: Findings of the Association for Computational Linguistics. ; , s. 12241-12248
  • Conference paper (peer-reviewed)abstract
    • Previous approaches to turn-taking and response generation in conversational systems have treated it as a two-stage process: First, the end of a turn is detected (based on conversation history), then the system generates an appropriate response. Humans, however, do not take the turn just because it is likely, but also consider whether what they want to say fits the position. In this paper, we present a model (an extension of TurnGPT) that conditions the end-of-turn prediction on both conversation history and what the next speaker wants to say. We found that our model consistently outperforms the baseline model in a variety of metrics. The improvement is most prominent in two scenarios where turn predictions can be ambiguous solely from the conversation history: 1) when the current utterance contains a statement followed by a question; 2) when the end of the current utterance semantically matches the response. Treating the turn-prediction and response-ranking as a one-stage process, our findings suggest that our model can be used as an incremental response ranker, which can be applied in various settings.
  •  
35.
  • Jonell, Patrik, et al. (author)
  • Crowdsourcing a self-evolving dialog graph
  • 2019
  • In: CUI '19: Proceedings of the 1st International Conference on Conversational User Interfaces. - New York, NY, USA : Association for Computing Machinery (ACM). - 9781450371872
  • Conference paper (peer-reviewed)abstract
    • In this paper we present a crowdsourcing-based approach for collecting dialog data for a social chat dialog system, which gradually builds a dialog graph from actual user responses and crowd-sourced system answers, conditioned by a given persona and other instructions. This approach was tested during the second instalment of the Amazon Alexa Prize 2018 (AP2018), both for the data collection and to feed a simple dialog system which would use the graph to provide answers. As users interacted with the system, a graph which maintained the structure of the dialogs was built, identifying parts where more coverage was needed. In an ofine evaluation, we have compared the corpus collected during the competition with other potential corpora for training chatbots, including movie subtitles, online chat forums and conversational data. The results show that the proposed methodology creates data that is more representative of actual user utterances, and leads to more coherent and engaging answers from the agent. An implementation of the proposed method is available as open-source code.
  •  
36.
  • Kamelabad, Alireza M., 1993-, et al. (author)
  • I Learn Better Alone! Collaborative and Individual Word Learning With a Child and Adult Robot
  • 2023
  • In: Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction. - New York, NY, United States : Association for Computing Machinery (ACM). ; , s. 368-377
  • Conference paper (peer-reviewed)abstract
    • The use of social robots as a tool for language learning has been studied quite extensively recently. Although their effectiveness and comparison with other technologies are well studied, the effects of the robot’s appearance and the interaction setting have received less attention. As educational robots are envisioned to appear in household or school environments, it is important to investigate how their designed persona or interaction dynamics affect learning outcomes. In such environments, children may do the activities together or alone or perform them in the presence of an adult or another child. In this regard, we have identified two novel factors to investigate: the robot’s perceived age (adult or child) and the number of learners interacting with the robot simultaneously (one or two). We designed an incidental word learning card game with the Furhat robot and ran a between-subject experiment with 75 middle school participants. We investigated the interactions and effects of children’s word learning outcomes, speech activity, and perception of the robot’s role. The results show that children who played alone with the robot had better word retention and anthropomorphized the robot more, compared to those who played in pairs. Furthermore, unlike previous findings from human-human interactions, children did not show different behaviors in the presence of a robot designed as an adult or a child. We discuss these factors in detail and make a novel contribution to the direct comparison of collaborative versus individual learning and the new concept of the robot’s age.
  •  
37.
  • Kontogiorgos, Dimosthenis, 1987-, et al. (author)
  • A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction
  • 2018
  • In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). - Paris. ; , s. 119-127
  • Conference paper (peer-reviewed)abstract
    • In this paper we present a corpus of multiparty situated interaction where participants collaborated on moving virtual objects on a large touch screen. A moderator facilitated the discussion and directed the interaction. The corpus contains recordings of a variety of multimodal data, in that we captured speech, eye gaze and gesture data using a multisensory setup (wearable eye trackers, motion capture and audio/video). Furthermore, in the description of the multimodal corpus, we investigate four different types of social gaze: referential gaze, joint attention, mutual gaze and gaze aversion by both perspectives of a speaker and a listener. We annotated the groups’ object references during object manipulation tasks and analysed the group’s proportional referential eye-gaze with regards to the referent object. When investigating the distributions of gaze during and before referring expressions we could corroborate the differences in time between speakers’ and listeners’ eye gaze found in earlier studies. This corpus is of particular interest to researchers who are interested in social eye-gaze patterns in turn-taking and referring language in situated multi-party interaction.
  •  
38.
  •  
39.
  • Kontogiorgos, Dimosthenis, 1987- (author)
  • Mutual Understanding in Situated Interactions with Conversational User Interfaces : Theory, Studies, and Computation
  • 2022
  • Doctoral thesis (other academic/artistic)abstract
    • This dissertation presents advances in HCI through a series of studies focusing on task-oriented interactions between humans and between humans and machines. The notion of mutual understanding is central, also known as grounding in psycholinguistics, in particular how people establish understanding in conversations and what interactional phenomena are present in that process. Addressing the gap in computational models of understanding, interactions in this dissertation are observed through multisensory input and evaluated with statistical and machine-learning models. As it becomes apparent, miscommunication is ordinary in human conversations and therefore embodied computer interfaces interacting with humans are subject to a large number of conversational failures. Investigating how these inter- faces can evaluate human responses to distinguish whether spoken utterances are understood is one of the central contributions of this thesis.The first papers (Papers A and B) included in this dissertation describe studies on how humans establish understanding incrementally and how they co-produce utterances to resolve misunderstandings in joint-construction tasks. Utilising the same interaction paradigm from such human-human settings, the remaining papers describe collaborative interactions between humans and machines with two central manipulations: embodiment (Papers C, D, E, and F) and conversational failures (Papers D, E, F, and G). The methods used investigate whether embodiment affects grounding behaviours among speakers and what verbal and non-verbal channels are utilised in response and recovery to miscommunication. For application to robotics and conversational user interfaces, failure detection systems are developed predicting in real-time user uncertainty, paving the way for new multimodal computer interfaces that are aware of dialogue breakdown and system failures.Through the lens of Theory, Studies, and Computation, a comprehensive overview is presented on how mutual understanding has been observed in interactions with humans and between humans and machines. A summary of literature in mutual understanding from psycholinguistics and human-computer interaction perspectives is reported. An overview is also presented on how prior knowledge in mutual understanding has and can be observed through experimentation and empirical studies, along with perspectives of how knowledge acquired through observation is put into practice through the analysis and development of computational models. Derived from literature and empirical observations, the central thesis of this dissertation is that embodiment and mutual understanding are intertwined in task-oriented interactions, both in successful communication but also in situations of miscommunication.
  •  
40.
  • Kontogiorgos, Dimosthenis, 1987-, et al. (author)
  • The Effects of Embodiment and Social Eye-Gaze in Conversational Agents
  • 2019
  • In: Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci).
  • Conference paper (peer-reviewed)abstract
    • The adoption of conversational agents is growing at a rapid pace. Agents however, are not optimised to simulate key social aspects of situated human conversational environments. Humans are intellectually biased towards social activity when facing more anthropomorphic agents or when presented with subtle social cues. In this work, we explore the effects of simulating anthropomorphism and social eye-gaze in three conversational agents. We tested whether subjects’ visual attention would be similar to agents in different forms of embodiment and social eye-gaze. In a within-subject situated interaction study (N=30), we asked subjects to engage in task-oriented dialogue with a smart speaker and two variations of a social robot. We observed shifting of interactive behaviour by human users, as shown in differences in behavioural and objective measures. With a trade-off in task performance, social facilitation is higher with more anthropomorphic social agents when performing the same task.
  •  
41.
  • Li, Chengjie, et al. (author)
  • Effects of Posture and Embodiment on Social Distance in Human-Agent Interaction in Mixed Reality
  • 2018
  • In: Proceedings of the 18th International Conference on Intelligent Virtual Agents. - New York, NY, USA : ACM Digital Library. - 9781450360135 ; , s. 191-196
  • Conference paper (peer-reviewed)abstract
    • Mixed reality offers new potentials for social interaction experiences with virtual agents. In addition, it can be used to experiment with the design of physical robots. However, while previous studies have investigated comfortable social distances between humans and artificial agents in real and virtual environments, there is little data with regards to mixed reality environments. In this paper, we conducted an experiment in which participants were asked to walk up to an agent to ask a question, in order to investigate the social distances maintained, as well as the subject's experience of the interaction. We manipulated both the embodiment of the agent (robot vs. human and virtual vs. physical) as well as closed vs. open posture of the agent. The virtual agent was displayed using a mixed reality headset. Our experiment involved 35 participants in a within-subject design. We show that, in the context of social interactions, mixed reality fares well against physical environments, and robots fare well against humans, barring a few technical challenges.
  •  
42.
  • Mishra, Chinmaya, et al. (author)
  • Does a robot's gaze aversion affect human gaze aversion?
  • 2023
  • In: Frontiers in Robotics and AI. - : Frontiers Media SA. - 2296-9144. ; 10
  • Journal article (peer-reviewed)abstract
    • Gaze cues serve an important role in facilitating human conversations and are generally considered to be one of the most important non-verbal cues. Gaze cues are used to manage turn-taking, coordinate joint attention, regulate intimacy, and signal cognitive effort. In particular, it is well established that gaze aversion is used in conversations to avoid prolonged periods of mutual gaze. Given the numerous functions of gaze cues, there has been extensive work on modelling these cues in social robots. Researchers have also tried to identify the impact of robot gaze on human participants. However, the influence of robot gaze behavior on human gaze behavior has been less explored. We conducted a within-subjects user study (N = 33) to verify if a robot's gaze aversion influenced human gaze aversion behavior. Our results show that participants tend to avert their gaze more when the robot keeps staring at them as compared to when the robot exhibits well-timed gaze aversions. We interpret our findings in terms of intimacy regulation: humans try to compensate for the robot's lack of gaze aversion.
  •  
43.
  • Mishra, Chinmaya, et al. (author)
  • Real-time emotion generation in human-robot dialogue using large language models
  • 2023
  • In: Frontiers in Robotics and AI. - : Frontiers Media SA. - 2296-9144. ; 10
  • Journal article (peer-reviewed)abstract
    • Affective behaviors enable social robots to not only establish better connections with humans but also serve as a tool for the robots to express their internal states. It has been well established that emotions are important to signal understanding in Human-Robot Interaction (HRI). This work aims to harness the power of Large Language Models (LLM) and proposes an approach to control the affective behavior of robots. By interpreting emotion appraisal as an Emotion Recognition in Conversation (ERC) tasks, we used GPT-3.5 to predict the emotion of a robot’s turn in real-time, using the dialogue history of the ongoing conversation. The robot signaled the predicted emotion using facial expressions. The model was evaluated in a within-subjects user study (N = 47) where the model-driven emotion generation was compared against conditions where the robot did not display any emotions and where it displayed incongruent emotions. The participants interacted with the robot by playing a card sorting game that was specifically designed to evoke emotions. The results indicated that the emotions were reliably generated by the LLM and the participants were able to perceive the robot’s emotions. It was found that the robot expressing congruent model-driven facial emotion expressions were perceived to be significantly more human-like, emotionally appropriate, and elicit a more positive impression. Participants also scored significantly better in the card sorting game when the robot displayed congruent facial expressions. From a technical perspective, the study shows that LLMs can be used to control the affective behavior of robots reliably in real-time. Additionally, our results could be used in devising novel human-robot interactions, making robots more effective in roles where emotional interaction is important, such as therapy, companionship, or customer service.
  •  
44.
  •  
45.
  • Peters, Christopher, et al. (author)
  • Investigating Social Distances between Humans, Virtual Humans and Virtual Robots in Mixed Reality
  • 2018
  • In: Proceedings of 17th International Conference on Autonomous Agents and MultiAgent Systems. ; , s. 2247-2249
  • Conference paper (peer-reviewed)abstract
    • Mixed reality environments offer new potentials for the design of compelling social interaction experiences with virtual characters. In this paper, we summarise initial experiments we are conducting in which we measure comfortable social distances between humans, virtual humans and virtual robots in mixed reality environments. We consider a scenario in which participants walk within a comfortable distance of a virtual character that has its appearance varied between a male and female human, and a standard- and human-height virtual Pepper robot. Our studies in mixed reality thus far indicate that humans adopt social zones with artificial agents that are similar in manner to human-human social interactions and interactions in virtual reality.
  •  
46.
  • Peters, Christopher, et al. (author)
  • Towards the use of mixed reality for hri design via virtual robots
  • 2018
  • In: HRI '20: Companion of the 2020 ACM/IEEE International Conference on Human-Robot InteractionMarch 2020.
  • Conference paper (peer-reviewed)abstract
    • Mixed reality, which seeks to better merge virtual objects and theirinteractions with the real environment, offers numerous potentialsfor the improved design of robots and our interactions with them. Inthis paper, we present our ongoing work towards the developmentof a mixed reality platform for designing social interactions withrobots through the use of virtual robots. We present a summaryour work thus far on the use of the platform for investigatingproxemics between humans and virtual robots, and also highlightfuture research directions. These include the consideration of moresophisticated interactions involving verbal behaviours, interactionwith small formations of virtual robots, better integration of virtualobjects into real environments and experiments comparing the realsystems with their virtual counterparts.
  •  
47.
  • Roddy, M., et al. (author)
  • Investigating speech features for continuous turn-taking prediction using LSTMs
  • 2018
  • In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. - : International Speech Communication Association. ; , s. 586-590
  • Conference paper (peer-reviewed)abstract
    • For spoken dialog systems to conduct fluid conversational interactions with users, the systems must be sensitive to turn-taking cues produced by a user. Models should be designed so that effective decisions can be made as to when it is appropriate, or not, for the system to speak. Traditional end-of-turn models, where decisions are made at utterance end-points, are limited in their ability to model fast turn-switches and overlap. A more flexible approach is to model turn-taking in a continuous manner using RNNs, where the system predicts speech probability scores for discrete frames within a future window. The continuous predictions represent generalized turn-taking behaviors observed in the training data and can be applied to make decisions that are not just limited to end-of-turn detection. In this paper, we investigate optimal speech-related feature sets for making predictions at pauses and overlaps in conversation. We find that while traditional acoustic features perform well, part-of-speech features generally perform worse than word features. We show that our current models outperform previously reported baselines.
  •  
48.
  • Roddy, Matthew, et al. (author)
  • Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs
  • 2018
  • In: ICMI 2018 - Proceedings of the 2018 International Conference on Multimodal Interaction. - New York, NY, USA : ACM. - 9781450356923 ; , s. 186-190
  • Conference paper (peer-reviewed)abstract
    • In human conversational interactions, turn-taking exchanges can be coordinated using cues from multiple modalities. To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues from separate modalities into turn-taking models. We propose that there is an appropriate temporal granularity at which modalities should be modeled. We design a multiscale RNN architecture to model modalities at separate timescales in a continuous manner. Our results show that modeling linguistic and acoustic features at separate temporal rates can be beneficial for turn-taking modeling. We also show that our approach can be used to incorporate gaze features into turn-taking models.
  •  
49.
  • Shore, Todd, 1984-, et al. (author)
  • Enhancing reference resolution in dialogue using participant feedback
  • 2017
  • In: Proc. GLU 2017 International Workshop on Grounding Language Understanding. - Stockholm, Sweden : International Speech Communication Association. ; , s. 78-82
  • Conference paper (peer-reviewed)abstract
    • Expressions used to refer to entities in a common environment do not originate solely from one participant in a dialogue but are formed collaboratively. It is possible to train a model for resolving these referring expressions (REs) in a static manner using an appropriate corpus, but, due to the collaborative nature of their formation, REs are highly dependent not only on attributes of the referent in question (e.g. color, shape) but also on the dialogue participants themselves. As a proof of concept, we improved the accuracy of a words-as-classifiers logistic regression  model  by  incorporating  knowledge about  accepting/rejecting REs proposed from other participants.
  •  
50.
  • Shore, Todd, et al. (author)
  • KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue
  • 2019
  • In: LREC 2018 - 11th International Conference on Language Resources and Evaluation. - Tokyo. ; , s. 768-775
  • Conference paper (peer-reviewed)abstract
    • There is a growing body of research focused on task-oriented instructor-manipulator dialogue, whereby one dialogue participant initiates a reference to an entity in a common environment while the other participant must resolve this reference in order to manipulate said entity. Many of these works are based on disparate if nevertheless similar datasets. This paper described an English corpus of referring expressions in relatively free, unrestricted dialogue with physical features generated in a simulation, which facilitate analysis of dialogic linguistic phenomena regarding alignment in the formation of referring expressions known as conceptual pacts.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-50 of 62
Type of publication
conference paper (46)
journal article (10)
doctoral thesis (4)
research review (2)
Type of content
peer-reviewed (56)
other academic/artistic (6)
Author/Editor
Skantze, Gabriel, 19 ... (59)
Axelsson, Agnes, 199 ... (6)
Peters, Christopher (5)
Gustafson, Joakim (3)
Yang, Fangkai (3)
Li, Chengjie (3)
show more...
Skantze, Gabriel, Pr ... (3)
Székely, Eva (2)
Gao, Alex Yuan (2)
Traum, David (2)
Avramova, Vanya (2)
Axelsson, Nils, 1992 ... (2)
Romeo, Marta (2)
Shore, Todd (2)
Kragic, Danica, 1971 ... (1)
Abelho Pereira, Andr ... (1)
Oertel, Catharine (1)
Dimarogonas, Dimos V ... (1)
Beskow, Jonas (1)
Ahlberg, Sofie (1)
Axelsson, Agnes (1)
Yu, Pian (1)
Shaw Cortez, Wencesl ... (1)
Ghadirzadeh, Ali (1)
Castellano, Ginevra (1)
Alexanderson, Simon (1)
Lopes, J. (1)
Albert, Saul (1)
Carlson, Rolf (1)
Maraev, Vladislav, 1 ... (1)
Hough, Julian (1)
Fallgren, Per (1)
Pereira, André (1)
Ashkenazi, Shaul (1)
Stuart-Smith, Jane (1)
Foster, Mary Ellen (1)
Boye, Johan (1)
André, Elisabeth, Pr ... (1)
Buschmeier, Hendrik (1)
Vaddadi, Bhavana, Ph ... (1)
Bogdan, Cristian M, ... (1)
Aylett, Matthew Pete ... (1)
McMillan, Donald (1)
Fischer, Joel (1)
Reyes-Cruz, Gisela (1)
Gkatzia, Dimitra (1)
Dogan, Fethiye Irmak (1)
Wennberg, Ulme (1)
Hernandez Garcia, Da ... (1)
Blomsma, Peter (1)
show less...
University
Royal Institute of Technology (62)
Uppsala University (2)
University of Gothenburg (1)
Language
English (62)
Research subject (UKÄ/SCB)
Natural sciences (55)
Engineering and Technology (8)
Humanities (3)

Year

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view