SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "WFRF:(Salvi Giampiero) srt2:(1999)"

Search: WFRF:(Salvi Giampiero) > (1999)

  • Result 1-5 of 5
Sort/group result
   
EnumerationReferenceCoverFind
1.
  •  
2.
  • Agelfors, Eva, et al. (author)
  • Synthetic visual speech driven from auditory speech
  • 1999
  • In: Proceedings of Audio-Visual Speech Processing (AVSP'99)).
  • Conference paper (peer-reviewed)abstract
    • We have developed two different methods for using auditory, telephone speech to drive the movements of a synthetic face. In the first method, Hidden Markov Models (HMMs) were trained on a phonetically transcribed telephone speech database. The output of the HMMs was then fed into a rulebased visual speech synthesizer as a string of phonemes together with time labels. In the second method, Artificial Neural Networks (ANNs) were trained on the same database to map acoustic parameters directly to facial control parameters. These target parameter trajectories were generated by using phoneme strings from a database as input to the visual speech synthesis The two methods were evaluated through audiovisual intelligibility tests with ten hearing impaired persons, and compared to “ideal” articulations (where no recognition was involved), a natural face, and to the intelligibility of the audio alone. It was found that the HMM method performs considerably better than the audio alone condition (54% and 34% keywords correct respectively), but not as well as the “ideal” articulating artificial face (64%). The intelligibility for the ANN method was 34% keywords correct.
  •  
3.
  •  
4.
  • Salvi, Giampiero (author)
  • Developing acoustic models for automatic speech recognition in swedish
  • 1999
  • In: The European Student Journal of Language and Speech. ; 1
  • Journal article (peer-reviewed)abstract
    • This thesis is concerned with automatic continuous speech recognition using trainable systems. The aim of this work is to build acoustic models for spoken Swedish. This is done employing hidden Markov models and using the SpeechDat database to train their parameters. Acoustic modeling has been worked out at a phonetic level, allowing general speech recognition applications, even though a simplified task (digits and natural number recognition) has been considered for model evaluation. Different kinds of phone models have been tested, including context independent models and two variations of context dependent models. Furthermore many experiments have been done with bigram language models to tune some of the system parameters. System performance over various speaker subsets with different sex, age and dialect has also been examined. Results are compared to previous similar studies showing a remarkable improvement.
  •  
5.
  • Öhman, Tobias, et al. (author)
  • Using HMMs and ANNs for mapping acoustic to visual speech
  • 1999
  • In: TMH-QPSR. - : KTH Royal Institute of Technology. ; 40:1-2, s. 45-50
  • Journal article (other academic/artistic)abstract
    • In this paper we present two different methods for mapping auditory, telephonequality speech to visual parameter trajectories, specifying the movements of ananimated synthetic face. In the first method, Hidden Markov Models (HMMs)where used to obtain phoneme strings and time labels. These where thentransformed by rules into parameter trajectories for visual speech synthesis. In thesecond method, Artificial Neural Networks (ANNs) were trained to directly mapacoustic parameters to synthesis parameters. Speaker independent HMMs weretrained on a phonetically transcribed telephone speech database. Differentunderlying units of speech were modelled by the HMMs, such as monophones,diphones, triphones, and visemes. The ANNs were trained on male, female , andmixed speakers.The HMM method and the ANN method were evaluated through audio-visualintelligibility tests with ten hearing impaired persons, and compared to “ideal”articulations (where no recognition was involved), a natural face, and to theintelligibility of the audio alone. It was found that the HMM method performsconsiderably better than the audio alone condition (54% and 34% keywordscorrect, respectively), but not as well as the “ideal” articulating artificial face(64%). The intelligibility for the ANN method was 34% keywords correct.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-5 of 5

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view