1. |
- Cano Santín, José Miguel, 1990, et al.
(author)
-
Fast visual grounding in interaction: bringing few-shot learning with neural networks to an interactive robot
- 2020
-
In: Proceedings of Conference on Probability and Meaning (PaM-2020), Gothenburg, Sweden (online) / Christine Howes, Stergios Chatzikyriakidis, Adam Ek and Vidya Somashekarappa (eds.). - : Association for Computational Linguistics (ACL). - 2002-9764.
-
Conference paper (peer-reviewed)abstract
- The major shortcomings of using neural networks with situated agents are that in incremental interaction very few learning examples are available and that their visual sensory representations are quite different from image caption datasets. In this work we adapt and evaluate a few-shot learning approach, Matching Networks (Vinyals et al., 2016), to conversational strategies of a robot interacting with a human tutor in order to efficiently learn to categorise objects that are presented to it and also investigate to what degree transfer learning from pre-trained models on images from different contexts can improve its performance. We discuss the implications of such learning on the nature of semantic representations the system has learned.
|
|
2. |
- Cano Santín, José Miguel, 1990, et al.
(author)
-
Interactive visual grounding with neural networks
- 2019
-
In: Proceedings of LondonLogue - Semdial 2019: The 23rd Workshop on the Semantics and Pragmatics of Dialogue, London, 4-6 September 2019. - London, UK : Queen Mary University of London. - 2308-2275.
-
Conference paper (peer-reviewed)abstract
- Training strategies for neural networks are not suitable for real time human-robot interaction. Few-shot learning approaches have been developed for low resource scenarios but without the usual teacher/learner supervision. In this work we present a combination of both: a situated dialogue system to teach object names to a robot from its camera images using Matching Networks (Vinyals et al., 2016). We compare the performance of the system with transferred learning from pre-trained models and different conversational strategies with a human tutor.
|
|