SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Salvi Giampiero) srt2:(2010-2014)"

Sökning: WFRF:(Salvi Giampiero) > (2010-2014)

  • Resultat 1-10 av 21
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ananthakrishnan, Gopal, et al. (författare)
  • Using Imitation to learn Infant-Adult Acoustic Mappings
  • 2011
  • Ingår i: 12th Annual Conference Of The International Speech Communication Association 2011 (INTERSPEECH 2011), Vols 1-5. - : ISCA. - 9781618392701 ; , s. 772-775
  • Konferensbidrag (refereegranskat)abstract
    • This paper discusses a model which conceptually demonstrates how infants could learn the normalization between infant-adult acoustics. The model proposes that the mapping can be inferred from the topological correspondences between the adult and infant acoustic spaces, that are clustered separately in an unsupervised manner. The model requires feedback from the adult in order to select the right topology for clustering, which is a crucial aspect of the model. The feedback Is in terms of an overall rating of the imitation effort by the infant, rather than a frame-by-frame correspondence. Using synthetic, but continuous speech data, we demonstrate that clusters, which have a good topological correspondence, are perceived to be similar by a phonetically trained listener.
  •  
2.
  • Koniaris, Christos, 1979-, et al. (författare)
  • Auditory and Dynamic Modeling Paradigms to Detect L2 Mispronunciations
  • 2012
  • Ingår i: 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1. - 9781622767595 ; , s. 898-901
  • Konferensbidrag (refereegranskat)abstract
    • This paper expands our previous work on automatic pronunciation error detection that exploits knowledge from psychoacoustic auditory models. The new system has two additional important features, i.e., auditory and acoustic processing of the temporal cues of the speech signal, and classification feedback from a trained linear dynamic model. We also perform a pronunciation analysis by considering the task as a classification problem. Finally, we evaluate the proposed methods conducting a listening test on the same speech material and compare the judgment of the listeners and the methods. The automatic analysis based on spectro-temporal cues is shown to have the best agreement with the human evaluation, particularly with that of language teachers, and with previous plenary linguistic studies.
  •  
3.
  • Koniaris, Christos, 1979-, et al. (författare)
  • On mispronunciation analysis of individual foreign speakers using auditory periphery models
  • 2013
  • Ingår i: Speech Communication. - : Elsevier BV. - 0167-6393 .- 1872-7182. ; 55:5, s. 691-706
  • Tidskriftsartikel (refereegranskat)abstract
    • In second language (L2) learning, a major difficulty is to discriminate between the acoustic diversity within an L2 phoneme category and that between different categories. We propose a general method for automatic diagnostic assessment of the pronunciation of nonnative speakers based on models of the human auditory periphery. Considering each phoneme class separately, the geometric shape similarity between the native auditory domain and the non-native speech domain is measured. The phonemes that deviate the most from the native pronunciation for a set of L2 speakers are detected by comparing the geometric shape similarity measure with that calculated for native speakers on the same phonemes. To evaluate the system, we have tested it with different non-native speaker groups from various language backgrounds. The experimental results are in accordance with linguistic findings and human listeners' ratings, particularly when both the spectral and temporal cues of the speech signal are utilized in the pronunciation analysis.
  •  
4.
  • Koniaris, Christos, 1979-, et al. (författare)
  • On the Benefit of Using Auditory Modeling for Diagnostic Evaluation of Pronunciations
  • 2012
  • Ingår i: International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden, June 6-8, 2012. ; , s. 59-64
  • Konferensbidrag (refereegranskat)abstract
    • In this paper we demonstrate that a psychoacoustic model-based distance measure performs better than a speech signal distance measure in assessing the pronunciation of individual foreign speakers. The experiments show that the perceptual based-method performs not only quantitatively better than a speech spectrum-based method, but also qualitatively better, hence showing that auditory information is beneficial in the task of pronunciation error detection. We first present the general approach of the method, which is using the dissimilarity between the native perceptual domain and the non-native speech power spectrum domain. The problematic phonemes for a given non-native speaker are determined by the degree of disparity between the dissimilarity measure for the non-native and a group of native speakers. The two methods compared here are applied to different groups of non-native speakers of various language backgrounds and validated against a theoretical linguistic study.
  •  
5.
  • Lindblom, Björn, et al. (författare)
  • Sound systems are shaped by their users : The recombination of phonetic substance
  • 2011
  • Ingår i: Where Do Phonological Features Come From?. - : John Benjamins Publishing Company. - 9789027208231 ; , s. 67-97
  • Bokkapitel (övrigt vetenskapligt/konstnärligt)abstract
    • Computational experiments were run using an optimization criterion based on independently motivated definitions of perceptual contrast, articulatory cost and learning cost. The question: If stop+vowel inventories are seen as adaptations to perceptual, articulatory and developmental constraints what would they be like? Simulations successfully predicted typologically widely observed place preferences and the re-use of place features (‘phonemic coding’) in voiced stop inventories. These results demonstrate the feasibility of user-based accounts of phonological facts and indicate the nature of the constraints that over time might shape the formation of both the formal structure and the intrinsic content of sound patterns. While phonetic factors are commonly invoked to account for substantive aspects of phonology, their explanatory scope is here also extended to a fundamental attribute of its formal organization: the combinatorial re-use of phonetic content.
  •  
6.
  • Neiberg, Daniel, 1976-, et al. (författare)
  • Semi-supervised methods for exploring the acoustics of simple productive feedback
  • 2013
  • Ingår i: Speech Communication. - : Elsevier BV. - 0167-6393 .- 1872-7182. ; 55:3, s. 451-469
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper proposes methods for exploring acoustic correlates to feedback functions. A sub-language of Swedish, simple productive feedback, is introduced to facilitate investigations of the functional contributions of base tokens, phonological operations and prosody. The function of feedback is to convey the listeners' attention, understanding and affective states. In order to handle the large number of possible affective states, the current study starts by performing a listening experiment where humans annotated the functional similarity of feedback tokens with different prosodic realizations. By selecting a set of stimuli that had different prosodic distances from a reference token, it was possible to compute a generalised functional distance measure. The resulting generalised functional distance measure showed to be correlated to prosodic distance but the correlations varied as a function of base tokens and phonological operations. In a subsequent listening test, a small representative sample of feedback tokens were rated for understanding, agreement, interest, surprise and certainty. These ratings were found to explain a significant proportion of the generalised functional distance. By combining the acoustic analysis with an explorative visualisation of the prosody, we have established a map between human perception of similarity between feedback tokens, their measured distance in acoustic space, and the link to the perception of the function of feedback tokens with varying realisations.
  •  
7.
  • Oertel, Catharine, et al. (författare)
  • A Gaze-based Method for Relating Group Involvement to Individual Engagement in Multimodal Multiparty Dialogue
  • 2013
  • Ingår i: ICMI 2013 - Proceedings of the 2013 ACM International Conference on Multimodal Interaction. - New York, NY, USA : Association for Computing Machinery (ACM). - 9781450321297 ; , s. 99-106
  • Konferensbidrag (refereegranskat)abstract
    • This paper is concerned with modelling individual engagement and group involvement as well as their relationship in an eight-party, mutimodal corpus. We propose a number of features (presence, entropy, symmetry and maxgaze) that summarise different aspects of eye-gaze patterns and allow us to describe individual as well as group behaviour in time. We use these features to define similarities between the subjects and we compare this information with the engagement rankings the subjects expressed at the end of each interactions about themselves and the other participants. We analyse how these features relate to four classes of group involvement and we build a classifier that is able to distinguish between those classes with 71% of accuracy.
  •  
8.
  •  
9.
  • Pieropan, Alessandro, et al. (författare)
  • A dataset of human manipulation actions
  • 2014
  • Ingår i: ICRA 2014 Workshop on Autonomous Grasping and Manipulation. - Hong Kong, China.
  • Konferensbidrag (refereegranskat)abstract
    • We present a data set of human activities that includes both visual data (RGB-D video and six Degrees Of Freedom (DOF) object pose estimation) and acoustic data. Our vision is that robots need to merge information from multiple perceptional modalities to operate robustly and autonomously in an unstructured environment.
  •  
10.
  • Pieropan, Alessandro, et al. (författare)
  • Audio-Visual Classification and Detection of Human Manipulation Actions
  • 2014
  • Ingår i: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014). - : IEEE conference proceedings. - 9781479969340 ; , s. 3045-3052
  • Konferensbidrag (refereegranskat)abstract
    • Humans are able to merge information from multiple perceptional modalities and formulate a coherent representation of the world. Our thesis is that robots need to do the same in order to operate robustly and autonomously in an unstructured environment. It has also been shown in several fields that multiple sources of information can complement each other, overcoming the limitations of a single perceptual modality. Hence, in this paper we introduce a data set of actions that includes both visual data (RGB-D video and 6DOF object pose estimation) and acoustic data. We also propose a method for recognizing and segmenting actions from continuous audio-visual data. The proposed method is employed for extensive evaluation of the descriptive power of the two modalities, and we discuss how they can be used jointly to infer a coherent interpretation of the recorded action.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 21

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy