SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Székely Éva) srt2:(2015-2019)"

Sökning: WFRF:(Székely Éva) > (2015-2019)

  • Resultat 1-12 av 12
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Betz, Simon, et al. (författare)
  • The greennn tree - lengthening position influences uncertainty perception
  • 2019
  • Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2019. - : The International Speech Communication Association (ISCA). ; , s. 3990-3994
  • Konferensbidrag (refereegranskat)abstract
    • Synthetic speech can be used to express uncertainty in dialogue systems by means of hesitation. If a phrase like “Next to the green tree” is uttered in a hesitant way, that is, containing lengthening, silences, and fillers, the listener can infer that the speaker is not certain about the concepts referred to. However, we do not know anything about the referential domain of the uncertainty; if only a particular word in this sentence would be uttered hesitantly, e.g. “the greee:n tree”, the listener could infer that the uncertainty refers to the color in the statement, but not to the object. In this study, we show that the domain of the uncertainty is controllable. We conducted an experiment in which color words in sentences like “search for the green tree” were lengthened in two different positions: word onsets or final consonants, and participants were asked to rate the uncertainty regarding color and object. The results show that initial lengthening is predominantly associated with uncertainty about the word itself, whereas final lengthening is primarily associated with the following object. These findings enable dialogue system developers to finely control the attitudinal display of uncertainty, adding nuances beyond the lexical content to message delivery.
  •  
2.
  • Clark, Leigh, et al. (författare)
  • Mapping Theoretical and Methodological Perspectives for Understanding Speech Interface Interactions
  • 2019
  • Ingår i: CHI EA '19 EXTENDED ABSTRACTS. - New York, NY, USA : ASSOC COMPUTING MACHINERY.
  • Konferensbidrag (refereegranskat)abstract
    • The use of speech as an interaction modality has grown considerably through the integration of Intelligent Personal Assistants (IPAs- e.g. Siri, Google Assistant) into smartphones and voice based devices (e.g. Amazon Echo). However, there remain significant gaps in using theoretical frameworks to understand user behaviours and choices and how they may applied to specific speech interface interactions. This part-day multidisciplinary workshop aims to critically map out and evaluate theoretical frameworks and methodological approaches across a number of disciplines and establish directions for new paradigms in understanding speech interface user behaviour. In doing so, we will bring together participants from HCI and other speech related domains to establish a cohesive, diverse and collaborative community of researchers from academia and industry with interest in exploring theoretical and methodological issues in the field.
  •  
3.
  • Cowan, Benjamin R., et al. (författare)
  • They Know as Much as We Do : Knowledge Estimation and Partner Modelling of Artificial Partners
  • 2017
  • Ingår i: CogSci 2017 - Proceedings of the 39th Annual Meeting of the Cognitive Science Society: Computational Foundations of Cognition. - : The Cognitive Science Society. ; , s. 1836-1841
  • Konferensbidrag (refereegranskat)abstract
    • Conversation partners' assumptions about each other's knowledge (their partner models) on a subject are important in spoken interaction. However, little is known about what influences our partner models in spoken interactions with artificial partners. In our experiment we asked people to name 15 British landmarks, and estimate their identifiability to a person as well as an automated conversational agent of either British or American origin. Our results show that people's assumptions about what an artificial partner knows are related to their estimates of what other people are likely to know - but they generally estimate artificial partners to have more knowledge in the task than human partners. These findings shed light on the way in which people build partner models of artificial partners. Importantly, they suggest that people use assumptions about what other humans know as a heuristic when assessing an artificial partner's knowledge.
  •  
4.
  • Oertel, Catharine, et al. (författare)
  • Using crowd-sourcing for the design of listening agents : Challenges and opportunities
  • 2017
  • Ingår i: ISIAA 2017 - Proceedings of the 1st ACM SIGCHI International Workshop on Investigating Social Interactions with Artificial Agents, Co-located with ICMI 2017. - New York, NY, USA : Association for Computing Machinery (ACM). - 9781450355582 ; , s. 37-38
  • Konferensbidrag (refereegranskat)abstract
    • In this paper we are describing how audio-visual corpora recordings using crowd-sourcing techniques can be used for the audio-visual synthesis of attitudinal non-verbal feedback expressions for virtual agents. We are discussing the limitations of this approach as well as where we see the opportunities for this technology.
  •  
5.
  • Székely, Éva, et al. (författare)
  • Casting to Corpus : Segmenting and Selecting Spontaneous Dialogue for TTS with a CNN-LSTM Speaker-Dependent Breath Detector
  • 2019
  • Ingår i: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). - : IEEE. - 9781479981311 ; , s. 6925-6929
  • Konferensbidrag (refereegranskat)abstract
    • This paper considers utilising breaths to create improved spontaneous-speech corpora for conversational text-to-speech from found audio recordings such as dialogue podcasts. Breaths are of interest since they relate to prosody and speech planning and are independent of language and transcription. Specifically, we propose a semisupervised approach where a fraction of coarsely annotated data is used to train a convolutional and recurrent speaker-specific breath detector operating on spectrograms and zero-crossing rate. The classifier output is used to find target-speaker breath groups (audio segments delineated by breaths) and subsequently select those that constitute clean utterances appropriate for a synthesis corpus. An application to 11 hours of raw podcast audio extracts 1969 utterances (106 minutes), 87% of which are clean and correctly segmented. This outperforms a baseline that performs integrated VAD and speaker attribution without accounting for breaths.
  •  
6.
  •  
7.
  • Székely, Éva, et al. (författare)
  • Off the cuff : Exploring extemporaneous speech delivery with TTS
  • 2019
  • Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. - : International Speech Communication Association. ; , s. 3687-3688
  • Konferensbidrag (refereegranskat)abstract
    • Extemporaneous speech is a delivery type in public speaking which uses a structured outline but is otherwise delivered conversationally, off the cuff. This demo uses a natural-sounding spontaneous conversational speech synthesiser to simulate this delivery style. We resynthesised the beginnings of two Interspeech keynote speeches with TTS that produces multiple different versions of each utterance that vary in fluency and filled-pause placement. The platform allows the user to mark the samples according to any perceptual aspect of interest, such as certainty, authenticity, confidence, etc. During the speech delivery, they can decide on the fly which realisation to play, addressing their audience in a connected, conversational fashion. Our aim is to use this platform to explore speech synthesis evaluation options from a production perspective and in situational contexts.
  •  
8.
  • Székely, Éva, et al. (författare)
  • Spontaneous conversational speech synthesis from found data
  • 2019
  • Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. - : ISCA. ; , s. 4435-4439
  • Konferensbidrag (refereegranskat)abstract
    • Synthesising spontaneous speech is a difficult task due to disfluencies, high variability and syntactic conventions different from those of written language. Using found data, as opposed to lab-recorded conversations, for speech synthesis adds to these challenges because of overlapping speech and the lack of control over recording conditions. In this paper we address these challenges by using a speaker-dependent CNN-LSTM breath detector to separate continuous recordings into utterances, which we here apply to extract nine hours of clean single-speaker breath groups from a conversational podcast. The resulting corpus is transcribed automatically (both lexical items and filler tokens) and used to build several voices on a Tacotron 2 architecture. Listening tests show: i) pronunciation accuracy improved with phonetic input and transfer learning; ii) it is possible to create a more fluent conversational voice by training on data without filled pauses; and iii) the presence of filled pauses improved perceived speaker authenticity. Another listening test showed the found podcast voice to be more appropriate for prompts from both public speeches and casual conversations, compared to synthesis from found read speech and from a manually transcribed lab-recorded spontaneous conversation.
  •  
9.
  • Szekely, Eva, et al. (författare)
  • Synthesising uncertainty : The interplay of vocal effort and hesitation disfluencies
  • 2017
  • Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. - : International Speech Communication Association. ; , s. 804-808
  • Konferensbidrag (refereegranskat)abstract
    • As synthetic voices become more flexible, and conversational systems gain more potential to adapt to the environmental and social situation, the question needs to be examined, how different modifications to the synthetic speech interact with each other and how their specific combinations influence perception. This work investigates how the vocal effort of the synthetic speech together with added disfluencies affect listeners' perception of the degree of uncertainty in an utterance. We introduce a DNN voice built entirely from spontaneous conversational speech data and capable of producing a continuum of vocal efforts, prolongations and filled pauses with a corpus-based method. Results of a listener evaluation indicate that decreased vocal effort, filled pauses and prolongation of function words increase the degree of perceived uncertainty of conversational utterances expressing the speaker's beliefs. We demonstrate that the effect of these three cues are not merely additive, but that interaction effects, in particular between the two types of disfluencies and between vocal effort and prolongations need to be considered when aiming to communicate a specific level of uncertainty. The implications of these findings are relevant for adaptive and incremental conversational systems using expressive speech synthesis and aspiring to communicate the attitude of uncertainty.
  •  
10.
  • Székely, Éva, et al. (författare)
  • The Effect of Soft, Modal and Loud Voice Levels on Entrainment in Noisy Conditions
  • 2015
  • Ingår i: Sixteenth Annual Conference of the International Speech Communication Association.
  • Konferensbidrag (refereegranskat)abstract
    • Conversation partners have a tendency to adapt their vocal in- tensity to each other and to other social and environmental fac- tors. A socially adequate vocal intensity level by a speech syn- thesiser that goes beyond mere volume adjustment is highly de- sirable for a rewarding and successful human-machine or ma- chine mediated human-human interaction. This paper examines the interaction of the Lombard effect and speaker entrainment in a controlled experiment conducted with a confederate inter- locutor. The interlocutor was asked to maintain either a soft, a modal or a loud voice level during the dialogues. Through half of the trials, subjects were exposed to a cocktail party noise through headphones. The analytical results suggest that both the background noise and the interlocutor’s voice level affect the dynamics of speaker entrainment. Speakers appear to still en- train to the voice level of their interlocutor in noisy conditions, though to a lesser extent, as strategies of ensuring intelligibility affect voice levels as well. These findings could be leveraged in spoken dialogue systems and speech generating devices to help choose a vocal effort level for the synthetic voice that is both intelligible and socially suited to a specific interaction. 
  •  
11.
  •  
12.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-12 av 12

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy