SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Carson Berndsen Julie) "

Sökning: WFRF:(Carson Berndsen Julie)

  • Resultat 1-15 av 15
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Abou-Zleikha, Mohamed, et al. (författare)
  • Multi-level exemplar-based duration generation for expressive speech synthesis
  • 2012
  • Ingår i: Proceedings of Speech Prosody.
  • Konferensbidrag (refereegranskat)abstract
    • The generation of duration of speech units from linguistic in- formation, as one component of a prosody model, is consid- ered to be a requirement for natural sounding speech synthesis. This paper investigates the use of a multi-level exemplar-based model for duration generation for the purposes of expressive speech synthesis. The multi-level exemplar-based model has been proposed in the literature as a cognitive model for the pro- duction of duration. The implementation of this model for dura- tion generation for speech synthesis is not straightforward and requires a set of modifications to the model and that the linguis- tically related units and the context of the target units should be taken into consideration. The work presented in this paper implements this model and presents a solution to these issues through the use of prosodic-syntactic correlated data, full con- text information of the input example and corpus exemplars. 
  •  
2.
  • Ahmed, Zeeshan, et al. (författare)
  • A system for facial expression-based affective speech translation
  • 2013
  • Ingår i: Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion. ; , s. 57-58
  • Konferensbidrag (refereegranskat)abstract
    • In the emerging eld of speech-to-speech translation, empha- sis is currently placed on the linguistic content, while the sig- ni cance of paralinguistic information conveyed by facial ex- pression or tone of voice is typically neglected. We present a prototype system for multimodal speech-to-speech transla- tion that is able to automatically recognize and translate spo- ken utterances from one language into another, with the out- put rendered by a speech synthesis system. The novelty of our system lies in the technique of generating the synthetic speech output in one of several expressive styles that is au- tomatically determined using a camera to analyze the user’s facial expression during speech. 
  •  
3.
  • Cabral, Joao P, et al. (författare)
  • Rapidly Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz.
  • 2012
  • Ingår i: Proceedings of the International Conference on Language Resources and Evaluation. ; , s. 4136-4142
  • Konferensbidrag (refereegranskat)abstract
    • This paper describes a prototype of a computer-assisted pronunciation training system called MySpeech. The interface of the MySpeech system is web-based and it currently enables users to practice pronunciation by listening to speech spoken by native speakers and tuning their speech production to correct any mispronunciations detected by the system. This practice exercise is facilitated in different topics and difficulty levels. An experiment was conducted in this work that combines the MySpeech service with the WebWOZ Wizard-of-Oz platform (http://www.webwoz.com), in order to improve the human-computer interaction (HCI) of the service and the feedback that it provides to the user. The employed Wizard-of-Oz method enables a human (who acts as a wizard) to give feedback to the practising user, while the user is not aware that there is another person involved in the communication. This experiment permitted to quickly test an HCI model before its implementation on the MySpeech system. It also allowed to collect input data from the wizard that can be used to improve the proposed model. Another outcome of the experiment was the preliminary evalua- tion of the pronunciation learning service in terms of user satisfaction, which would be difficult to conduct before integrating the HCI part. 
  •  
4.
  •  
5.
  • Cahill, Peter, et al. (författare)
  • Ucd blizzard challenge 2011 entry
  • 2011
  • Ingår i: Proceedings of the Blizzard Challenge Workshop.
  • Konferensbidrag (refereegranskat)abstract
    • This paper gives an overview of the UCD Blizzard Challenge 2011 entry. The entry is a unit selection synthesiser that uses hidden Markov models for prosodic modelling. The evaluation consisted of synthesising 2213 sentences from a high quality 15 hour dataset provided by Lessac Technologies. Results are analysed within the context of other systems and the future work for the system is discussed. 
  •  
6.
  • Székely, Éva, et al. (författare)
  • Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters.
  • 2011
  • Ingår i: 12th Annual Conference of the International-Speech-Communication-Association 2011 (INTERSPEECH 2011). - : ISCA-INT SPEECH COMMUNICATION ASSOC. - 9781618392701 ; , s. 2409-2412
  • Konferensbidrag (refereegranskat)abstract
    • A great challenge for text-to-speech synthesis is to produce ex- pressive speech. The main problem is that it is difficult to syn- thesise high-quality speech using expressive corpora. With the increasing interest in audiobook corpora for speech synthesis, there is a demand to synthesise speech which is rich in prosody, emotions and voice styles. In this work, Self-Organising Fea- ture Maps (SOFM) are used for clustering the speech data using voice quality parameters of the glottal source, in order to map out the variety of voice styles in the corpus. Subjective evalu- ation showed that this clustering method successfully separated the speech data into groups of utterances associated with dif- ferent voice characteristics. This work can be applied in unit- selection synthesis by selecting appropriate data sets to synthe- sise utterances with specific voice styles. It can also be used in parametric speech synthesis to model different voice styles separately. 
  •  
7.
  • Székely, Éva, et al. (författare)
  • Detecting a targeted voice style in an audiobook using voice quality features
  • 2012
  • Ingår i: Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 4593-4596
  • Konferensbidrag (refereegranskat)abstract
    • Audiobooks are known to contain a variety of expressive speaking styles that occur as a result of the narrator mimicking a character in a story, or expressing affect. An accurate modeling of this variety is essential for the purposes of speech synthesis from an audiobook. Voice quality differences are important features characterizing these different speaking styles, which are realized on a gradient and are often difficult to predict from the text. The present study uses a pa- rameter characterizing breathy to tense voice qualities using features of the wavelet transform, and a measure for identifying creaky seg- ments in an utterance. Based on these features, a combination of supervised and unsupervised classification is used to detect the re- gions in an audiobook, where the speaker changes his regular voice quality to a particular voice style. The target voice style candidates are selected based on the agreement of the supervised classifier en- semble output, and evaluated in a listening test. 
  •  
8.
  • Székely, Éva, et al. (författare)
  • Evaluating expressive speech synthesis from audiobooks in conversational phrases
  • 2012
  • Konferensbidrag (refereegranskat)abstract
    • Audiobooks are a rich resource of large quantities of natural sounding, highly expressive speech. In our previous research we have shown that it is possible to detect different expressive voice styles represented in a particular audiobook, using unsupervised clustering to group the speech corpus of the audiobook into smaller subsets representing the detected voice styles. These subsets of corpora of different voice styles reflect the various ways a speaker uses their voice to express involvement and affect, or imitate characters. This study is an evaluation of the detection of voice styles in an audiobook in the application of expressive speech synthesis. A further aim of this study is to investigate the usability of audiobooks as a language resource for expressive speech synthesis of utterances of conversational speech. Two evaluations have been carried out to assess the effect of the genre transfer: transmitting expressive speech from read aloud literature to conversational phrases with the application of speech synthesis. The first evaluation revealed that listeners have different voice style preferences for a particular conversational phrase. The second evaluation showed that it is possible for users of speech synthesis systems to learn the characteristics of a certain voice style well enough to make reliable predictions about what a certain utterance will sound like when synthesised using that voice style. 
  •  
9.
  • Székely, Éva, et al. (författare)
  • Facial expression as an input annotation modality for affective speech-to-speech translation
  • 2012
  • Konferensbidrag (refereegranskat)abstract
    • One of the challenges of speech-to-speech translation is to accurately preserve the paralinguistic information in the speaker’s message. In this work we explore the use of automatic facial expression analysis as an input annotation modality to transfer paralinguistic information at a symbolic level from input to output in speech-to-speech translation. To evaluate the feasibility of this ap- proach, a prototype system, FEAST (Facial Expression-based Affective Speech Translation) has been developed. FEAST classifies the emotional state of the user and uses it to render the translated output in an appropriate voice style, using expressive speech synthesis. 
  •  
10.
  • Székely, Éva, et al. (författare)
  • Facial expression-based affective speech translation
  • 2014
  • Ingår i: Journal on Multimodal User Interfaces. - : Springer Science and Business Media LLC. - 1783-7677 .- 1783-8738. ; 8:1, s. 87-96
  • Tidskriftsartikel (refereegranskat)abstract
    • One of the challenges of speech-to-speech trans- lation is to accurately preserve the paralinguistic informa- tion in the speaker’s message. Information about affect and emotional intent of a speaker are often carried in more than one modality. For this reason, the possibility of multimodal interaction with the system and the conversation partner may greatly increase the likelihood of a successful and gratifying communication process. In this work we explore the use of automatic facial expression analysis as an input annotation modality to transfer paralinguistic information at a symbolic level from input to output in speech-to-speech translation. To evaluate the feasibility of this approach, a prototype sys- tem, FEAST (facial expression-based affective speech trans- lation) has been developed. FEAST classifies the emotional state of the user and uses it to render the translated output in an appropriate voice style, using expressive speech synthesis. 
  •  
11.
  • Székely, Éva, et al. (författare)
  • Predicting synthetic voice style from facial expressions. An application for augmented conversations
  • 2014
  • Ingår i: Speech Communication. - : Elsevier BV. - 0167-6393 .- 1872-7182. ; 57, s. 63-75
  • Tidskriftsartikel (refereegranskat)abstract
    • The ability to efficiently facilitate social interaction and emotional expression is an important, yet unmet requirement for speech generating devices aimed at individuals with speech impairment. Using gestures such as facial expressions to control aspects of expressive synthetic speech could contribute to an improved communication experience for both the user of the device and the conversation partner. For this purpose, a mapping model between facial expressions and speech is needed, that is high level (utterance-based), versatile and personalisable. In the mapping developed in this work, visual and auditory modalities are connected based on the intended emotional salience of a message: the intensity of facial expressions of the user to the emotional intensity of the synthetic speech. The mapping model has been implemented in a system called WinkTalk that uses estimated facial expression categories and their intensity values to automat- ically select between three expressive synthetic voices reflecting three degrees of emotional intensity. An evaluation is conducted through an interactive experiment using simulated augmented conversations. The results have shown that automatic control of synthetic speech through facial expressions is fast, non-intrusive, sufficiently accurate and supports the user to feel more involved in the conversation. It can be concluded that the system has the potential to facilitate a more efficient communication process between user and listener. 
  •  
12.
  • Székely, Éva, et al. (författare)
  • Synthesizing expressive speech from amateur audiobook recordings
  • 2012
  • Ingår i: Spoken Language Technology Workshop (SLT). ; , s. 297-302
  • Konferensbidrag (refereegranskat)abstract
    • Freely available audiobooks are a rich resource of expressive speech recordings that can be used for the purposes of speech synthesis. Natural sounding, expressive synthetic voices have previously been built from audiobooks that contained large amounts of highly expressive speech recorded from a profes- sionally trained speaker. The majority of freely available au- diobooks, however, are read by amateur speakers, are shorter and contain less expressive (less emphatic, less emotional, etc.) speech both in terms of quality and quantity. Synthesiz- ing expressive speech from a typical online audiobook there- fore poses many challenges. In this work we address these challenges by applying a method consisting of minimally su- pervised techniques to align the text with the recorded speech, select groups of expressive speech segments and build expres- sive voices for hidden Markov-model based synthesis using speaker adaptation. Subjective listening tests have shown that the expressive synthetic speech generated with this method is often able to produce utterances suited to an emotional mes- sage. We used a restricted amount of speech data in our exper- iment, in order to show that the method is generally applicable to most typical audiobooks widely available online. 
  •  
13.
  • Székely, Éva, et al. (författare)
  • The Effect of Soft, Modal and Loud Voice Levels on Entrainment in Noisy Conditions
  • 2015
  • Ingår i: Sixteenth Annual Conference of the International Speech Communication Association.
  • Konferensbidrag (refereegranskat)abstract
    • Conversation partners have a tendency to adapt their vocal in- tensity to each other and to other social and environmental fac- tors. A socially adequate vocal intensity level by a speech syn- thesiser that goes beyond mere volume adjustment is highly de- sirable for a rewarding and successful human-machine or ma- chine mediated human-human interaction. This paper examines the interaction of the Lombard effect and speaker entrainment in a controlled experiment conducted with a confederate inter- locutor. The interlocutor was asked to maintain either a soft, a modal or a loud voice level during the dialogues. Through half of the trials, subjects were exposed to a cocktail party noise through headphones. The analytical results suggest that both the background noise and the interlocutor’s voice level affect the dynamics of speaker entrainment. Speakers appear to still en- train to the voice level of their interlocutor in noisy conditions, though to a lesser extent, as strategies of ensuring intelligibility affect voice levels as well. These findings could be leveraged in spoken dialogue systems and speech generating devices to help choose a vocal effort level for the synthetic voice that is both intelligible and socially suited to a specific interaction. 
  •  
14.
  • Székely, Éva, et al. (författare)
  • WinkTalk : a demonstration of a multimodal speech synthesis platform linking facial expressions to expressive synthetic voices
  • 2012
  • Ingår i: Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies. - : Association for Computational Linguistics. ; , s. 5-8
  • Konferensbidrag (refereegranskat)abstract
    • This paper describes a demonstration of the WinkTalk system, which is a speech synthe- sis platform using expressive synthetic voices. With the help of a webcamera and facial ex- pression analysis, the system allows the user to control the expressive features of the syn- thetic speech for a particular utterance with their facial expressions. Based on a person- alised mapping between three expressive syn- thetic voices and the users facial expressions, the system selects a voice that matches their face at the moment of sending a message. The WinkTalk system is an early research pro- totype that aims to demonstrate that facial expressions can be used as a more intuitive control over expressive speech synthesis than manual selection of voice types, thereby con- tributing to an improved communication expe- rience for users of speech generating devices. 
  •  
15.
  • Székely, Éva, et al. (författare)
  • WinkTalk: a multimodal speech synthesis interface linking facial expressions to expressive synthetic voices
  • 2012
  • Ingår i: Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies.
  • Konferensbidrag (refereegranskat)abstract
    • This paper describes a demonstration of the WinkTalk system, which is a speech synthe- sis platform using expressive synthetic voices. With the help of a webcamera and facial ex- pression analysis, the system allows the user to control the expressive features of the syn- thetic speech for a particular utterance with their facial expressions. Based on a person- alised mapping between three expressive syn- thetic voices and the users facial expressions, the system selects a voice that matches their face at the moment of sending a message. The WinkTalk system is an early research pro- totype that aims to demonstrate that facial expressions can be used as a more intuitive control over expressive speech synthesis than manual selection of voice types, thereby con- tributing to an improved communication expe- rience for users of speech generating devices. 
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-15 av 15

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy