Sökning: WFRF:(Székely Éva) >
Generating coherent...
Generating coherent spontaneous speech and gesture from text
-
- Alexanderson, Simon (författare)
- KTH,Tal, musik och hörsel, TMH
-
- Székely, Éva (författare)
- KTH,Tal, musik och hörsel, TMH
-
- Henter, Gustav Eje, Assistant Professor (författare)
- KTH,Tal, musik och hörsel, TMH,Robotik, perception och lärande, RPL
-
visa fler...
-
- Kucherenko, Taras, 1994- (författare)
- KTH,Robotik, perception och lärande, RPL
-
- Beskow, Jonas (författare)
- KTH,Tal, musik och hörsel, TMH
-
visa färre...
-
(creator_code:org_t)
- 2020-10-19
- 2020
- Engelska.
-
Ingår i: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020. - New York, NY, USA : Association for Computing Machinery (ACM).
- Relaterad länk:
-
http://arxiv.org/pdf...
-
visa fler...
-
https://urn.kb.se/re...
-
https://doi.org/10.1...
-
visa färre...
Abstract
Ämnesord
Stäng
- Embodied human communication encompasses both verbal (speech) and non-verbal information (e.g., gesture and head movements). Recent advances in machine learning have substantially improved the technologies for generating synthetic versions of both of these types of data: On the speech side, text-to-speech systems are now able to generate highly convincing, spontaneous-sounding speech using unscripted speech audio as the source material. On the motion side, probabilistic motion-generation methods can now synthesise vivid and lifelike speech-driven 3D gesticulation. In this paper, we put these two state-of-the-art technologies together in a coherent fashion for the first time. Concretely, we demonstrate a proof-of-concept system trained on a single-speaker audio and motion-capture dataset, that is able to generate both speech and full-body gestures together from text input. In contrast to previous approaches for joint speech-and-gesture generation, we generate full-body gestures from speech synthesis trained on recordings of spontaneous speech from the same person as the motion-capture data. We illustrate our results by visualising gesture spaces and textspeech-gesture alignments, and through a demonstration video.
Ämnesord
- HUMANIORA -- Språk och litteratur -- Jämförande språkvetenskap och allmän lingvistik (hsv//swe)
- HUMANITIES -- Languages and Literature -- General Language Studies and Linguistics (hsv//eng)
- NATURVETENSKAP -- Data- och informationsvetenskap -- Människa-datorinteraktion (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Human Computer Interaction (hsv//eng)
- NATURVETENSKAP -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Computer Sciences (hsv//eng)
Nyckelord
- Gesture synthesis
- neural networks
- text-to-speech
- Audio acoustics
- Audio systems
- Intelligent virtual agents
- Motion capture
- Speech synthesis
- Human communications
- Motion capture data
- Motion generation
- Non-verbal information
- Proof of concept
- Source material
- Spontaneous speech
- Text-to-speech system
- Speech communication
Publikations- och innehållstyp
- ref (ämneskategori)
- kon (ämneskategori)