SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Cabral Joao P) "

Sökning: WFRF:(Cabral Joao P)

  • Resultat 1-7 av 7
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Cabral, Joao P, et al. (författare)
  • Rapidly Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz.
  • 2012
  • Ingår i: Proceedings of the International Conference on Language Resources and Evaluation. ; , s. 4136-4142
  • Konferensbidrag (refereegranskat)abstract
    • This paper describes a prototype of a computer-assisted pronunciation training system called MySpeech. The interface of the MySpeech system is web-based and it currently enables users to practice pronunciation by listening to speech spoken by native speakers and tuning their speech production to correct any mispronunciations detected by the system. This practice exercise is facilitated in different topics and difficulty levels. An experiment was conducted in this work that combines the MySpeech service with the WebWOZ Wizard-of-Oz platform (http://www.webwoz.com), in order to improve the human-computer interaction (HCI) of the service and the feedback that it provides to the user. The employed Wizard-of-Oz method enables a human (who acts as a wizard) to give feedback to the practising user, while the user is not aware that there is another person involved in the communication. This experiment permitted to quickly test an HCI model before its implementation on the MySpeech system. It also allowed to collect input data from the wizard that can be used to improve the proposed model. Another outcome of the experiment was the preliminary evalua- tion of the pronunciation learning service in terms of user satisfaction, which would be difficult to conduct before integrating the HCI part. 
  •  
2.
  •  
3.
  • Székely, Éva, et al. (författare)
  • Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters.
  • 2011
  • Ingår i: 12th Annual Conference of the International-Speech-Communication-Association 2011 (INTERSPEECH 2011). - : ISCA-INT SPEECH COMMUNICATION ASSOC. - 9781618392701 ; , s. 2409-2412
  • Konferensbidrag (refereegranskat)abstract
    • A great challenge for text-to-speech synthesis is to produce ex- pressive speech. The main problem is that it is difficult to syn- thesise high-quality speech using expressive corpora. With the increasing interest in audiobook corpora for speech synthesis, there is a demand to synthesise speech which is rich in prosody, emotions and voice styles. In this work, Self-Organising Fea- ture Maps (SOFM) are used for clustering the speech data using voice quality parameters of the glottal source, in order to map out the variety of voice styles in the corpus. Subjective evalu- ation showed that this clustering method successfully separated the speech data into groups of utterances associated with dif- ferent voice characteristics. This work can be applied in unit- selection synthesis by selecting appropriate data sets to synthe- sise utterances with specific voice styles. It can also be used in parametric speech synthesis to model different voice styles separately. 
  •  
4.
  • Székely, Éva, et al. (författare)
  • Evaluating expressive speech synthesis from audiobooks in conversational phrases
  • 2012
  • Konferensbidrag (refereegranskat)abstract
    • Audiobooks are a rich resource of large quantities of natural sounding, highly expressive speech. In our previous research we have shown that it is possible to detect different expressive voice styles represented in a particular audiobook, using unsupervised clustering to group the speech corpus of the audiobook into smaller subsets representing the detected voice styles. These subsets of corpora of different voice styles reflect the various ways a speaker uses their voice to express involvement and affect, or imitate characters. This study is an evaluation of the detection of voice styles in an audiobook in the application of expressive speech synthesis. A further aim of this study is to investigate the usability of audiobooks as a language resource for expressive speech synthesis of utterances of conversational speech. Two evaluations have been carried out to assess the effect of the genre transfer: transmitting expressive speech from read aloud literature to conversational phrases with the application of speech synthesis. The first evaluation revealed that listeners have different voice style preferences for a particular conversational phrase. The second evaluation showed that it is possible for users of speech synthesis systems to learn the characteristics of a certain voice style well enough to make reliable predictions about what a certain utterance will sound like when synthesised using that voice style. 
  •  
5.
  • Székely, Éva, et al. (författare)
  • Predicting synthetic voice style from facial expressions. An application for augmented conversations
  • 2014
  • Ingår i: Speech Communication. - : Elsevier BV. - 0167-6393 .- 1872-7182. ; 57, s. 63-75
  • Tidskriftsartikel (refereegranskat)abstract
    • The ability to efficiently facilitate social interaction and emotional expression is an important, yet unmet requirement for speech generating devices aimed at individuals with speech impairment. Using gestures such as facial expressions to control aspects of expressive synthetic speech could contribute to an improved communication experience for both the user of the device and the conversation partner. For this purpose, a mapping model between facial expressions and speech is needed, that is high level (utterance-based), versatile and personalisable. In the mapping developed in this work, visual and auditory modalities are connected based on the intended emotional salience of a message: the intensity of facial expressions of the user to the emotional intensity of the synthetic speech. The mapping model has been implemented in a system called WinkTalk that uses estimated facial expression categories and their intensity values to automat- ically select between three expressive synthetic voices reflecting three degrees of emotional intensity. An evaluation is conducted through an interactive experiment using simulated augmented conversations. The results have shown that automatic control of synthetic speech through facial expressions is fast, non-intrusive, sufficiently accurate and supports the user to feel more involved in the conversation. It can be concluded that the system has the potential to facilitate a more efficient communication process between user and listener. 
  •  
6.
  • Székely, Éva, et al. (författare)
  • WinkTalk : a demonstration of a multimodal speech synthesis platform linking facial expressions to expressive synthetic voices
  • 2012
  • Ingår i: Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies. - : Association for Computational Linguistics. ; , s. 5-8
  • Konferensbidrag (refereegranskat)abstract
    • This paper describes a demonstration of the WinkTalk system, which is a speech synthe- sis platform using expressive synthetic voices. With the help of a webcamera and facial ex- pression analysis, the system allows the user to control the expressive features of the syn- thetic speech for a particular utterance with their facial expressions. Based on a person- alised mapping between three expressive syn- thetic voices and the users facial expressions, the system selects a voice that matches their face at the moment of sending a message. The WinkTalk system is an early research pro- totype that aims to demonstrate that facial expressions can be used as a more intuitive control over expressive speech synthesis than manual selection of voice types, thereby con- tributing to an improved communication expe- rience for users of speech generating devices. 
  •  
7.
  • Székely, Éva, et al. (författare)
  • WinkTalk: a multimodal speech synthesis interface linking facial expressions to expressive synthetic voices
  • 2012
  • Ingår i: Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies.
  • Konferensbidrag (refereegranskat)abstract
    • This paper describes a demonstration of the WinkTalk system, which is a speech synthe- sis platform using expressive synthetic voices. With the help of a webcamera and facial ex- pression analysis, the system allows the user to control the expressive features of the syn- thetic speech for a particular utterance with their facial expressions. Based on a person- alised mapping between three expressive syn- thetic voices and the users facial expressions, the system selects a voice that matches their face at the moment of sending a message. The WinkTalk system is an early research pro- totype that aims to demonstrate that facial expressions can be used as a more intuitive control over expressive speech synthesis than manual selection of voice types, thereby con- tributing to an improved communication expe- rience for users of speech generating devices. 
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-7 av 7

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy