Sökning: onr:"swepub:oai:DiVA.org:kth-313622" >
Leveraging hierarch...
Leveraging hierarchy in multimodal generative models for effective cross-modality inference
-
Vasco, M. (författare)
-
- Yin, Hang (författare)
- KTH,Robotik, perception och lärande, RPL
-
Melo, F. S. (författare)
-
visa fler...
-
Paiva, A. (författare)
-
visa färre...
-
(creator_code:org_t)
- Elsevier BV, 2022
- 2022
- Engelska.
-
Ingår i: Neural Networks. - : Elsevier BV. - 0893-6080 .- 1879-2782. ; 146, s. 238-255
- Relaterad länk:
-
https://urn.kb.se/re...
-
visa fler...
-
https://doi.org/10.1...
-
visa färre...
Abstract
Ämnesord
Stäng
- This work addresses the problem of cross-modality inference (CMI), i.e., inferring missing data of unavailable perceptual modalities (e.g., sound) using data from available perceptual modalities (e.g., image). We overview single-modality variational autoencoder methods and discuss three problems of computational cross-modality inference, arising from recent developments in multimodal generative models. Inspired by neural mechanisms of human recognition, we contribute the NEXUS model, a novel hierarchical generative model that can learn a multimodal representation of an arbitrary number of modalities in an unsupervised way. By exploiting hierarchical representation levels, NEXUS is able to generate high-quality, coherent data of missing modalities given any subset of available modalities. To evaluate CMI in a natural scenario with a high number of modalities, we contribute the “Multimodal Handwritten Digit” (MHD) dataset, a novel benchmark dataset that combines image, motion, sound and label information from digit handwriting. We access the key role of hierarchy in enabling high-quality samples during cross-modality inference and discuss how a novel training scheme enables NEXUS to learn a multimodal representation robust to missing modalities at test time. Our results show that NEXUS outperforms current state-of-the-art multimodal generative models in regards to their cross-modality inference capabilities.
Ämnesord
- TEKNIK OCH TEKNOLOGIER -- Elektroteknik och elektronik -- Datorsystem (hsv//swe)
- ENGINEERING AND TECHNOLOGY -- Electrical Engineering, Electronic Engineering, Information Engineering -- Computer Systems (hsv//eng)
- NATURVETENSKAP -- Data- och informationsvetenskap -- Datorseende och robotik (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Computer Vision and Robotics (hsv//eng)
- NATURVETENSKAP -- Data- och informationsvetenskap -- Datavetenskap (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Computer Sciences (hsv//eng)
Nyckelord
- Cross-modality inference
- Deep learning
- Multimodal representation learning
- Auto encoders
- Cross modality
- Generative model
- High quality
- Learn
- Missing data
- Multi-modal
- article
- autoencoder
- handwriting
- human
- human experiment
- motion
- sound
Publikations- och innehållstyp
- ref (ämneskategori)
- art (ämneskategori)
Hitta via bibliotek
Till lärosätets databas