Leveraging hierarchy in multimodal generative models for effective cross-modality inference

Sökning: L773:0893 6080 > Leveraging hierarch...

QC 20220609
This work addresses the problem of cross-modality inference (CMI), i.e., inferring missing data of unavailable perceptual modalities (e.g., sound) using data from available perceptual modalities (e.g., image). We overview single-modality variational autoencoder methods and discuss three problems of computational cross-modality inference, arising from recent developments in multimodal generative models. Inspired by neural mechanisms of human recognition, we contribute the NEXUS model, a novel hierarchical generative model that can learn a multimodal representation of an arbitrary number of modalities in an unsupervised way. By exploiting hierarchical representation levels, NEXUS is able to generate high-quality, coherent data of missing modalities given any subset of available modalities. To evaluate CMI in a natural scenario with a high number of modalities, we contribute the “Multimodal Handwritten Digit” (MHD) dataset, a novel benchmark dataset that combines image, motion, sound and label information from digit handwriting. We access the key role of hierarchy in enabling high-quality samples during cross-modality inference and discuss how a novel training scheme enables NEXUS to learn a multimodal representation robust to missing modalities at test time. Our results show that NEXUS outperforms current state-of-the-art multimodal generative models in regards to their cross-modality inference capabilities.

Yin, HangKTH,Robotik, perception och lärande, RPL(Swepub:kth)u1q02pve (författare)
Melo, F. S. (författare)
Paiva, A. (författare)
KTHRobotik, perception och lärande, RPL (creator_code:org_t)

Om ämnet

TEKNIK OCH TEKNOLOGIER: TEKNIK OCH TEKNO ...; och Elektroteknik oc ...; och Datorsystem

NATURVETENSKAP: NATURVETENSKAP; och Data och informa ...; och Datorseende och ...

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.