SwePub
Sök i LIBRIS databas

  Extended search

WFRF:(Mendelson M)
 

Search: WFRF:(Mendelson M) > Beyond the listenin...

Beyond the listening test : An interactive approach to TTS Evaluation

Mendelson, Joseph (author)
KTH,Tal, musik och hörsel, TMH
Aylett, M. (author)
 (creator_code:org_t)
International Speech Communication Association, 2017
2017
English.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. - : International Speech Communication Association. ; , s. 249-253
  • Conference paper (peer-reviewed)
Abstract Subject headings
Close  
  • Traditionally, subjective text-To-speech (TTS) evaluation is performed through audio-only listening tests, where participants evaluate unrelated, context-free utterances. The ecological validity of these tests is questionable, as they do not represent real-world end-use scenarios. In this paper, we examine a novel approach to TTS evaluation in an imagined end-use, via a complex interaction with an avatar. 6 different voice conditions were tested: Natural speech, Unit Selection and Parametric Synthesis, in neutral and expressive realizations. Results were compared to a traditional audio-only evaluation baseline. Participants in both studies rated the voices for naturalness and expressivity. The baseline study showed canonical results for naturalness: Natural speech scored highest, followed by Unit Selection, then Parametric synthesis. Expressivity was clearly distinguishable in all conditions. In the avatar interaction study, participants rated naturalness in the same order as the baseline, though with smaller effect size; expressivity was not distinguishable. Further, no significant correlations were found between cognitive or affective responses and any voice conditions. This highlights 2 primary challenges in designing more valid TTS evaluations: in real-world use-cases involving interaction, listeners generally interact with a single voice, making comparative analysis unfeasible, and in complex interactions, the context and content may confound perception of voice quality.

Subject headings

NATURVETENSKAP  -- Data- och informationsvetenskap -- Människa-datorinteraktion (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Human Computer Interaction (hsv//eng)

Keyword

Expressive speech synthesis
Human-computer interaction
Interactive virtual agents
Listening tests
Statistical parametric speech synthesis
Subjective evaluation
TTS evaluation
Unit selection
User experience
Voice interface design

Publication and Content Type

ref (subject category)
kon (subject category)

To the university's database

Find more in SwePub

By the author/editor
Mendelson, Josep ...
Aylett, M.
About the subject
NATURAL SCIENCES
NATURAL SCIENCES
and Computer and Inf ...
and Human Computer I ...
Articles in the publication
By the university
Royal Institute of Technology

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view