SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Salvi Giampiero) srt2:(2000-2004)"

Sökning: WFRF:(Salvi Giampiero) > (2000-2004)

  • Resultat 1-9 av 9
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Beskow, Jonas, et al. (författare)
  • SYNFACE - A talking head telephone for the hearing-impaired
  • 2004
  • Ingår i: COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS. - BERLIN : SPRINGER. - 3540223347 ; , s. 1178-1185
  • Konferensbidrag (refereegranskat)abstract
    • SYNFACE is a telephone aid for hearing-impaired people that shows the lip movements of the speaker at the other telephone synchronised with the speech. The SYNFACE system consists of a speech recogniser that recognises the incoming speech and a synthetic talking head. The output from the recogniser is used to control the articulatory movements of the synthetic head. SYNFACE prototype systems exist for three languages: Dutch, English and Swedish and the first user trials have just started.
  •  
2.
  • Johansen, Finn Tore, et al. (författare)
  • The cost 249 speechdat multilingual reference recogniser
  • 2000
  • Konferensbidrag (refereegranskat)abstract
    • The COST 249 SpeechDat reference recogniser is a fully automatic, language-independent training procedure for building a phonetic recogniser. It relies on the HTK toolkit and a SpeechDat(II) compatible database. The recogniser is designed to serve as a reference system in multilingual recognition research. This paper documents version 0.93 of the reference recogniser and presents results on smallvocabulary recognition for seven languages.
  •  
3.
  • Karlsson, Inger, et al. (författare)
  • SYNFACE - a talking face telephone
  • 2003
  • Ingår i: Proceedings of EUROSPEECH 2003. ; , s. 1297-1300
  • Konferensbidrag (refereegranskat)abstract
    • The SYNFACE project has as its primary goal to facilitate for hearing-impaired people to use an ordinary telephone. This will be achieved by using a talking face connected to the telephone. The incoming speech signal will govern the speech movements of the talking face, hence the talking face will provide lip-reading support for the user.The project will define the visual speech information that supports lip-reading, and develop techniques to derive this information from the acoustic speech signal in near real time for three different languages: Dutch, English and Swedish. This requires the development of automatic speech recognition methods that detect information in the acoustic signal that correlates with the speech movements. This information will govern the speech movements in a synthetic face and synchronise them with the acoustic speech signal. A prototype system is being constructed. The prototype contains results achieved so far in SYNFACE. This system will be tested and evaluated for the three languages by hearing-impaired users. SYNFACE is an IST project (IST-2001-33327) with partners from the Netherlands, UK and Sweden. SYNFACE builds on experiences gained in the Swedish Teleface project.
  •  
4.
  • Lindberg, Borge, et al. (författare)
  • a noise robust multilingual reference recogniser based on speechdat(II)
  • 2000
  • Konferensbidrag (refereegranskat)abstract
    • An important aspect of noise robustness of automatic speech recognisers (ASR) is the proper handling of non-speech acoustic events. The present paper describes further improvements of an already existing reference recogniser towards achieving such kind of robustness. The reference recogniser applied is the COST 249 SpeechDat reference recogniser, which is a fully automatic, language-independent training procedure for building a phonetic recogniser (http://www.telenor.no/fou/prosjekter/taletek/refrec). The reference recogniser relies on the HTK toolkit and a Speech- Dat(II) compatible database, and is designed to serve as a reference system in multilingual speech recognition research. The paper describes version 0.96 of the reference recogniser which take into account labelled non-speech acoustic events during training and provides robustness against these during testing. Results are presented on small and medium vocabulary recognition for six languages.
  •  
5.
  • Salvi, Giampiero (författare)
  • Accent clustering in Swedish using the Bhattacharyya distance
  • 2003
  • Ingår i: Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS), Barcelona Spain. ; , s. 1149-1152
  • Konferensbidrag (refereegranskat)abstract
    • In an attempt to improve automatic speech recognition(ASR) models for Swedish, accent variations wereconsidered. These have proved to be important variablesin the statistical distribution of the acoustic featuresusually employed in ASR. The analysis of featurevariability have revealed phenomena that are consistentwith what is known from phonetic investigations,suggesting that a consistent part of the informationabout accents could be derived form those features. Agraphical interface has been developed to simplify thevisualization of the geographical distributions of thesephenomena.
  •  
6.
  • Salvi, Giampiero (författare)
  • Truncation error and dynamics in very low latency phonetic recognition
  • 2003
  • Ingår i: Proceedings of Non Linear Speech Processing (NOLISP).
  • Konferensbidrag (refereegranskat)abstract
    • The truncation error for a two-pass decoder is analyzed in a problem of phonetic speech recognition for very demanding latency constraints (look-ahead length < 100ms) and for applications where successive renements of the hypotheses are not allowed. This is done empirically in the framework of hybrid MLP/HMM models. The ability of recurrent MLPs, as a posteriori probability estimators, to model time variations is also considered, and its interaction with the dynamic modeling in the decoding phase is shown in the simulations.
  •  
7.
  • Salvi, Giampiero (författare)
  • Using accent information in ASR models for Swedish
  • 2003
  • Ingår i: Proceedings of INTERSPEECH'2003. ; , s. 2677-2680
  • Konferensbidrag (refereegranskat)abstract
    • In this study accent information is used in an attempt to improve acoustic models for automatic speech recognition (ASR). First, accent dependent Gaussian models were trained independently. The Bhattacharyya distance was then used in conjunction with agglomerative hierarchical clustering to define optimal strategies for merging those models. The resulting allophonic classes were analyzed and compared with the phonetic literature. Finally, accent "aware" models were built, in which the parametric complexity for each phoneme corresponds to the degree of variability across accent areas and to the amount of training data available for it. The models were compared to models with the same, but evenly spread, overall complexity showing in some cases a slight improvement in recognition accuracy.
  •  
8.
  • Siciliano, C., et al. (författare)
  • Intelligibility of an ASR-controlled synthetic talking face
  • 2004
  • Ingår i: Journal of the Acoustical Society of America. - 0001-4966 .- 1520-8524. ; 115:5, s. 2428-
  • Tidskriftsartikel (refereegranskat)abstract
    • The goal of the SYNFACE project is to develop a multilingual synthetic talking face, driven by an automatic speech recognizer (ASR), to assist hearing‐impaired people with telephone communication. Previous multilingual experiments with the synthetic face have shown that time‐aligned synthesized visual face movements can enhance speech intelligibility in normal‐hearing and hearing‐impaired users [C. Siciliano et al., Proc. Int. Cong. Phon. Sci. (2003)]. Similar experiments are in progress to examine whether the synthetic face remains intelligible when driven by ASR output. The recognizer produces phonetic output in real time, in order to drive the synthetic face while maintaining normal dialogue turn‐taking. Acoustic modeling was performed with a neural network, while an HMM was used for decoding. The recognizer was trained on the SpeechDAT telephone speech corpus. Preliminary results suggest that the currently achieved recognition performance of around 60% frames correct limits the usefulness of the synthetic face movements. This is particularly true for consonants, where correct place of articulation is especially important for visual intelligibility. Errors in the alignment of phone boundaries representative of those arising in the ASR output were also shown to decrease audio‐visual intelligibility.
  •  
9.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-9 av 9

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy