DILF: Differentiable rendering-based multi-view Image–Language Fusion for zero-shot 3D shape understanding

↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Search: onr:"swepub:oai:DiVA.org:hh-51777" > DILF :

1 of 1
Previous record
Next record
To hitlist

Details
MARC

Ning, XinChinese Academy Of Sciences, Beijing, China (author)

DILF : Differentiable rendering-based multi-view Image–Language Fusion for zero-shot 3D shape understanding

Article/chapterEnglish2024

Publisher, publication year, extent ...

Amsterdam :Elsevier,2024
printrdacarrier

Numbers

LIBRIS-ID:oai:DiVA.org:hh-51777
https://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-51777URI
https://doi.org/10.1016/j.inffus.2023.102033DOI

Supplementary language notes

Language:English
Summary in:English

Part of subdatabase

SwePubSwePub

Classification

Subject category:ref swepub-contenttype
Subject category:art swepub-publicationtype

Notes

Funding agency:National Natural Science Foundation of China (NSFC) Grant number: 6237334Beijing Natural Science Foundation Grant number: L233036
Zero-shot 3D shape understanding aims to recognize “unseen” 3D categories that are not present in training data. Recently, Contrastive Language–Image Pre-training (CLIP) has shown promising open-world performance in zero-shot 3D shape understanding tasks by information fusion among language and 3D modality. It first renders 3D objects into multiple 2D image views and then learns to understand the semantic relationships between the textual descriptions and images, enabling the model to generalize to new and unseen categories. However, existing studies in zero-shot 3D shape understanding rely on predefined rendering parameters, resulting in repetitive, redundant, and low-quality views. This limitation hinders the model's ability to fully comprehend 3D shapes and adversely impacts the text–image fusion in a shared latent space. To this end, we propose a novel approach called Differentiable rendering-based multi-view Image–Language Fusion (DILF) for zero-shot 3D shape understanding. Specifically, DILF leverages large-scale language models (LLMs) to generate textual prompts enriched with 3D semantics and designs a differentiable renderer with learnable rendering parameters to produce representative multi-view images. These rendering parameters can be iteratively updated using a text–image fusion loss, which aids in parameters’ regression, allowing the model to determine the optimal viewpoint positions for each 3D object. Then a group-view mechanism is introduced to model interdependencies across views, enabling efficient information fusion to achieve a more comprehensive 3D shape understanding. Experimental results can demonstrate that DILF outperforms state-of-the-art methods for zero-shot 3D classification while maintaining competitive performance for standard 3D classification. The code is available at https://github.com/yuzaiyang123/DILP. © 2023 The Author(s)

Subject headings and genre

NATURVETENSKAP Data- och informationsvetenskap Språkteknologi hsv//swe
NATURAL SCIENCES Computer and Information Sciences Language Technology hsv//eng
Differentiable rendering
Information fusion
Text–image fusion
Zero-shot 3D shape understanding

Added entries (persons, corporate bodies, meetings, titles ...)

Yu, ZaiyangChinese Academy Of Sciences, Beijing, China; University Of Chinese Academy Of Sciences, Beijing, China (author)
Li, LusiOld Dominion University, Norfolk, United States (author)
Li, WeijunChinese Academy Of Sciences, Beijing, China (author)
Tiwari, Prayag,1991-Högskolan i Halmstad,Akademin för informationsteknologi(Swepub:hh)pratiw (author)
Chinese Academy Of Sciences, Beijing, ChinaChinese Academy Of Sciences, Beijing, China; University Of Chinese Academy Of Sciences, Beijing, China (creator_code:org_t)

Related titles

In:Information FusionAmsterdam : Elsevier102, s. 1-121566-25351872-6305

Internet link

Find in a library

Information Fusion (Search for host publication in LIBRIS)

To the university's database

1 of 1
Previous record
Next record
To hitlist

Find more in SwePub

By the author/editor: Ning, Xin; Yu, Zaiyang; Li, Lusi; Li, Weijun; Tiwari, Prayag, ...

About the subject

NATURAL SCIENCES: NATURAL SCIENCES; and Computer and Inf ...; and Language Technol ...

Articles in the publication: Information Fusi ...

By the university: Halmstad University

Search outside SwePub

Extend your search to:: Google; Google Book Search; Google Scholar

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

LIBRIS.kb.se