SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Salvi Giampiero) srt2:(2015-2019)"

Sökning: WFRF:(Salvi Giampiero) > (2015-2019)

  • Resultat 1-10 av 12
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Castellana, Antonella, et al. (författare)
  • Cepstral and entropy analyses in vowels excerpted from continuous speech of dysphonic and control speakers
  • 2017
  • Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech 2017. - : International Speech Communication Association. ; , s. 1814-1818
  • Konferensbidrag (refereegranskat)abstract
    • There is a growing interest in Cepstral and Entropy analyses of voice samples for defining a vocal health indicator, due to their reliability in investigating both regular and irregular voice signals. The purpose of this study is to determine whether the Cepstral Peak Prominence Smoothed (CPPS) and Sample Entropy (SampEn) could differentiate dysphonic speakers from normal speakers in vowels excerpted from readings and to compare their discrimination power. Results are reported for 33 patients and 31 controls, who read a standardized phonetically balanced passage while wearing a head mounted microphone. Vowels were excerpted from recordings using Automatic Speech Recognition and, after obtaining a measure for each vowel, individual distributions and their descriptive statistics were considered for CPPS and SampEn. The Receiver Operating Curve analysis revealed that the mean of the distributions was the parameter with the highest discrimination power for both CPPS and SampEn. CPPS showed a higher diagnostic precision than SampEn, exhibiting an Area Under Curve (AUC) of 0.85 compared to 0.72. A negative correlation between the parameters was found (Spearman; p = - 0.61), with higher SampEn corresponding to lower CPPS. The automatic method used in this study could provide support to voice monitorings in clinic and during individual's daily activities.
  •  
2.
  • Elblaus, Ludvig, 1981- (författare)
  • Crafting New Interfaces for Musical Expression
  • 2015
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • This thesis collects and contextualizes several projects involving artistically directed prototyping where new artifacts have been developed, in multi-disciplinary groups of practitioners, for use in performance contexts. These projects and their resulting publications have been team efforts, and therefore all papers have more than one author. In the introduction, a complementary perspective to that of the publications is offered, engaging with the characteristics of the digital innards of these artifacts and their digital material qualities. The stance that software source code is a design material is argued, and the notion of the crafting coder is used to view processes that use code as material for artistic creation. Code is also prominently featured in the introductory chapter with examples of some of the central components of the sound processing techniques that have been successfully used in the projects described in this thesis.The artifacts that are described in the thesis are: The Throat, an instrument for augmenting the singing voice using gestural control in real-time, The Vocal Chorder, a string based instrument using full-body interaction that also allows for audience participation through an installation mode, The Charged Room, a video tracking installation that lets users manipulate sound by moving across a stage, and Nebula, a garment that senses the users movements and responds with sound. These artifacts have been evaluated in the context they are designed for, and not only tested in laboratory settings, to make sure that the knowledge produced is valid. Several performances and peda-gogical courses have been used as empirical foundation for the claims of empowerment, expressivity, and performance qualities ascribed to the developed artifacts. 
  •  
3.
  •  
4.
  •  
5.
  • Lopes, José, et al. (författare)
  • Detecting Repetitions in Spoken Dialogue Systems Using Phonetic Distances
  • 2015
  • Ingår i: INTERSPEECH-2015. - 9781510817906 ; , s. 1805-1809
  • Konferensbidrag (refereegranskat)abstract
    • Repetitions in Spoken Dialogue Systems can be a symptom of problematic communication. Such repetitions are often due to speech recognition errors, which in turn makes it harder to use the output of the speech recognizer to detect repetitions. In this paper, we combine the alignment score obtained using phonetic distances with dialogue-related features to improve repetition detection. To evaluate the method proposed we compare several alignment techniques from edit distance to DTW-based distance, previously used in Spoken-Term detection tasks. We also compare two different methods to compute the phonetic distance: the first one using the phoneme sequence, and the second one using the distance between the phone posterior vectors. Two different datasets were used in this evaluation: a bus-schedule information system (in English) and a call routing system (in Swedish). The results show that approaches using phoneme distances over-perform approaches using Levenshtein distances between ASR outputs for repetition detection.
  •  
6.
  • Salvi, Giampiero (författare)
  • An Analysis of Shallow and Deep Representations of Speech Based on Unsupervised Classification of Isolated Words
  • 2016
  • Ingår i: Recent Advances in Nonlinear Speech Processing. - Cham : Springer. - 9783319281094 - 9783319281070 ; , s. 151-157
  • Konferensbidrag (refereegranskat)abstract
    • We analyse the properties of shallow and deep representa-tions of speech. Mel frequency cepstral coefficients (MFCC) are compared to representations learned by a four layer Deep Belief Network (DBN) in terms of discriminative power and invariance to irrelevant factors such as speaker identity or gender. To avoid the influence of supervised statistical modelling, an unsupervised isolated word classification task is used for the comparison. The deep representations are also obtained with unsupervised training (no back-propagation pass is performed). The results show that DBN features provide a more concise clustering and higher match between clusters and word categories in terms of adjusted Rand score. Some of the confusions present with the MFCC features are, however, retained even with the DBN features.
  •  
7.
  •  
8.
  • Selamtzis, Andreas, 1984-, et al. (författare)
  • Effect of vowel context in cepstral and entropy analysis of pathological voices
  • 2019
  • Ingår i: Biomedical Signal Processing and Control. - : Elsevier. - 1746-8094 .- 1746-8108. ; 47, s. 350-357
  • Tidskriftsartikel (refereegranskat)abstract
    • This study investigates the effect of vowel context (excerpted from speech versus sustained) on two voice quality measures: the cepstral peak prominence smoothed (CPPS) and sample entropy (SampEn). Thirty-one dysphonic subjects with different types of organic dysphonia and thirty-one controls read a phonetically balanced text and phonated sustained [a:] vowels in comfortable pitch and loudness. All the [a:] vowels of the read text were excerpted by automatic speech recognition and phonetic (forced) alignment. CPPS and SampEn were calculated for all excerpted vowels of each subject, forming one distribution of CPPS and SampEn values per subject. The sustained vowels were analyzed using a 41 ms window, forming another distribution of CPPS and SampEn values per subject. Two speech-language pathologists performed a perceptual evaluation of the dysphonic subjects’ voice quality from the recorded text. The power of discriminating the dysphonic group from the controls for SampEn and CPPS was assessed for the excerpted and sustained vowels with the Receiver-Operator Characteristic (ROC) analysis. The best discrimination in terms of Area Under Curve (AUC) for CPPS occurred using the mean of the excerpted vowel distributions (AUC=0.85) and for SampEn using the 95th percentile of the sustained vowel distributions (AUC=0.84). CPPS and SampEn were found to be negatively correlated, and the largest correlation was found between the corresponding 95th percentiles of their distributions (Pearson, r=−0.83, p < 10−3). A strong correlation was also found between the 95th percentile of SampEn distributions and the perceptual quality of breathiness (Pearson, r=0.83, p < 10−3). The results suggest that depending on the acoustic voice quality measure, sustained vowels can be more effective than excerpted vowels for detecting dysphonia. Additionally, when using CPPS or SampEn there is an advantage of using the measures’ distributions rather than their average values.
  •  
9.
  • Stefanov, Kalin, et al. (författare)
  • Modeling of Human Visual Attention in Multiparty Open-World Dialogues
  • 2019
  • Ingår i: ACM Transactions on Human-Robot Interaction. - : ASSOC COMPUTING MACHINERY. - 2573-9522. ; 8:2
  • Tidskriftsartikel (refereegranskat)abstract
    • This study proposes, develops, and evaluates methods for modeling the eye-gaze direction and head orientation of a person in multiparty open-world dialogues, as a function of low-level communicative signals generated by his/hers interlocutors. These signals include speech activity, eye-gaze direction, and head orientation, all of which can be estimated in real time during the interaction. By utilizing these signals and novel data representations suitable for the task and context, the developed methods can generate plausible candidate gaze targets in real time. The methods are based on Feedforward Neural Networks and Long Short-Term Memory Networks. The proposed methods are developed using several hours of unrestricted interaction data and their performance is compared with a heuristic baseline method. The study offers an extensive evaluation of the proposed methods that investigates the contribution of different predictors to the accurate generation of candidate gaze targets. The results show that the methods can accurately generate candidate gaze targets when the person being modeled is in a listening state. However, when the person being modeled is in a speaking state, the proposed methods yield significantly lower performance.
  •  
10.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 12

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy