SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Li Lusi) "

Sökning: WFRF:(Li Lusi)

  • Resultat 1-4 av 4
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ning, Xin, et al. (författare)
  • DILF : Differentiable rendering-based multi-view Image–Language Fusion for zero-shot 3D shape understanding
  • 2024
  • Ingår i: Information Fusion. - Amsterdam : Elsevier. - 1566-2535 .- 1872-6305. ; 102, s. 1-12
  • Tidskriftsartikel (refereegranskat)abstract
    • Zero-shot 3D shape understanding aims to recognize “unseen” 3D categories that are not present in training data. Recently, Contrastive Language–Image Pre-training (CLIP) has shown promising open-world performance in zero-shot 3D shape understanding tasks by information fusion among language and 3D modality. It first renders 3D objects into multiple 2D image views and then learns to understand the semantic relationships between the textual descriptions and images, enabling the model to generalize to new and unseen categories. However, existing studies in zero-shot 3D shape understanding rely on predefined rendering parameters, resulting in repetitive, redundant, and low-quality views. This limitation hinders the model's ability to fully comprehend 3D shapes and adversely impacts the text–image fusion in a shared latent space. To this end, we propose a novel approach called Differentiable rendering-based multi-view Image–Language Fusion (DILF) for zero-shot 3D shape understanding. Specifically, DILF leverages large-scale language models (LLMs) to generate textual prompts enriched with 3D semantics and designs a differentiable renderer with learnable rendering parameters to produce representative multi-view images. These rendering parameters can be iteratively updated using a text–image fusion loss, which aids in parameters’ regression, allowing the model to determine the optimal viewpoint positions for each 3D object. Then a group-view mechanism is introduced to model interdependencies across views, enabling efficient information fusion to achieve a more comprehensive 3D shape understanding. Experimental results can demonstrate that DILF outperforms state-of-the-art methods for zero-shot 3D classification while maintaining competitive performance for standard 3D classification. The code is available at https://github.com/yuzaiyang123/DILP. © 2023 The Author(s)
  •  
2.
  • Ran, Hang, et al. (författare)
  • Learning optimal inter-class margin adaptively for few-shot class-incremental learning via neural collapse-based meta-learning
  • 2024
  • Ingår i: Information Processing & Management. - London : Elsevier. - 0306-4573 .- 1873-5371. ; 61:3
  • Tidskriftsartikel (refereegranskat)abstract
    • Few-Shot Class-Incremental Learning (FSCIL) aims to learn new classes incrementally with a limited number of samples per class. It faces issues of forgetting previously learned classes and overfitting on few-shot classes. An efficient strategy is to learn features that are discriminative in both base and incremental sessions. Current methods improve discriminability by manually designing inter-class margins based on empirical observations, which can be suboptimal. The emerging Neural Collapse (NC) theory provides a theoretically optimal inter-class margin for classification, serving as a basis for adaptively computing the margin. Yet, it is designed for closed, balanced data, not for sequential or few-shot imbalanced data. To address this gap, we propose a Meta-learning- and NC-based FSCIL method, MetaNC-FSCIL, to compute the optimal margin adaptively and maintain it at each incremental session. Specifically, we first compute the theoretically optimal margin based on the NC theory. Then we introduce a novel loss function to ensure that the loss value is minimized precisely when the inter-class margin reaches its theoretically best. Motivated by the intuition that “learn how to preserve the margin” matches the meta-learning's goal of “learn how to learn”, we embed the loss function in base-session meta-training to preserve the margin for future meta-testing sessions. Experimental results demonstrate the effectiveness of MetaNC-FSCIL, achieving superior performance on multiple datasets. The code is available at https://github.com/qihangran/metaNC-FSCIL. © 2024 The Author(s)
  •  
3.
  • Tian, Songsong, et al. (författare)
  • A survey on few-shot class-incremental learning
  • 2024
  • Ingår i: Neural Networks. - Oxford : Elsevier. - 0893-6080 .- 1879-2782. ; 169, s. 307-324
  • Forskningsöversikt (refereegranskat)abstract
    • Large deep learning models are impressive, but they struggle when real-time data is not available. Few-shot class-incremental learning (FSCIL) poses a significant challenge for deep neural networks to learn new tasks from just a few labeled samples without forgetting the previously learned ones. This setup can easily leads to catastrophic forgetting and overfitting problems, severely affecting model performance. Studying FSCIL helps overcome deep learning model limitations on data volume and acquisition time, while improving practicality and adaptability of machine learning models. This paper provides a comprehensive survey on FSCIL. Unlike previous surveys, we aim to synthesize few-shot learning and incremental learning, focusing on introducing FSCIL from two perspectives, while reviewing over 30 theoretical research studies and more than 20 applied research studies. From the theoretical perspective, we provide a novel categorization approach that divides the field into five subcategories, including traditional machine learning methods, meta learning-based methods, feature and feature space-based methods, replay-based methods, and dynamic network structure-based methods. We also evaluate the performance of recent theoretical research on benchmark datasets of FSCIL. From the application perspective, FSCIL has achieved impressive achievements in various fields of computer vision such as image classification, object detection, and image segmentation, as well as in natural language processing and graph. We summarize the important applications. Finally, we point out potential future research directions, including applications, problem setups, and theory development. Overall, this paper offers a comprehensive analysis of the latest advances in FSCIL from a methodological, performance, and application perspective. © 2023 The Author(s)
  •  
4.
  • Yu, Zaiyang, et al. (författare)
  • MV-ReID : 3D Multi-view Transformation Network for Occluded Person Re-Identification
  • 2024
  • Ingår i: Knowledge-Based Systems. - Amsterdam : Elsevier. - 0950-7051 .- 1872-7409. ; 283
  • Tidskriftsartikel (refereegranskat)abstract
    • Re-identification (ReID) of occluded persons is a challenging task due to the loss of information in scenes with occlusions. Most existing methods for occluded ReID use 2D-based network structures to directly extract representations from 2D RGB (red, green, and blue) images, which can result in reduced performance in occluded scenes. However, since a person is a 3D non-grid object, learning semantic representations in a 2D space can limit the ability to accurately profile an occluded person. Therefore, it is crucial to explore alternative approaches that can effectively handle occlusions and leverage the full 3D nature of a person. To tackle these challenges, in this study, we employ a 3D view-based approach that fully utilizes the geometric information of 3D objects while leveraging advancements in 2D-based networks for feature extraction. Our study is the first to introduce a 3D view-based method in the areas of holistic and occluded ReID. To implement this approach, we propose a random rendering strategy that converts 2D RGB images into 3D multi-view images. We then use a 3D Multi-View Transformation Network for ReID (MV-ReID) to group and aggregate these images into a unified feature space. Compared to 2D RGB images, multi-view images can reconstruct occluded portions of a person in 3D space, enabling a more comprehensive understanding of occluded individuals. The experiments on benchmark datasets demonstrate that the proposed method achieves state-of-the-art results on occluded ReID tasks and exhibits competitive performance on holistic ReID tasks. These results also suggest that our approach has the potential to solve occlusion problems and contribute to the field of ReID. The source code and dataset are available at https://github.com/yuzaiyang123/MV-Reid. © 2023 Elsevier B.V.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-4 av 4
Typ av publikation
tidskriftsartikel (3)
forskningsöversikt (1)
Typ av innehåll
refereegranskat (4)
Författare/redaktör
Tiwari, Prayag, 1991 ... (4)
Li, Weijun (4)
Ning, Xin (4)
Li, Lusi (4)
Yu, Zaiyang (2)
Ran, Hang (2)
visa fler...
Tian, Songsong (2)
Hou, Luyang (1)
Jiang, Limin (1)
visa färre...
Lärosäte
Högskolan i Halmstad (4)
Språk
Engelska (4)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (4)
År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy