↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Träfflista för sökning "WFRF:(Salvi Giampiero) srt2:(2015-2019)"

Sökning: WFRF:(Salvi Giampiero) > (2015-2019)

Resultat 1-10 av 12

Sortera/gruppera träfflistan

Sortering: Träffar per sida:

Numrering	Referens	Omslagsbild	Hitta
1.	Castellana, Antonella, et al. (författare) Cepstral and entropy analyses in vowels excerpted from continuous speech of dysphonic and control speakers 2017 Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech 2017. - : International Speech Communication Association. ; , s. 1814-1818 Konferensbidrag (refereegranskat)abstract There is a growing interest in Cepstral and Entropy analyses of voice samples for defining a vocal health indicator, due to their reliability in investigating both regular and irregular voice signals. The purpose of this study is to determine whether the Cepstral Peak Prominence Smoothed (CPPS) and Sample Entropy (SampEn) could differentiate dysphonic speakers from normal speakers in vowels excerpted from readings and to compare their discrimination power. Results are reported for 33 patients and 31 controls, who read a standardized phonetically balanced passage while wearing a head mounted microphone. Vowels were excerpted from recordings using Automatic Speech Recognition and, after obtaining a measure for each vowel, individual distributions and their descriptive statistics were considered for CPPS and SampEn. The Receiver Operating Curve analysis revealed that the mean of the distributions was the parameter with the highest discrimination power for both CPPS and SampEn. CPPS showed a higher diagnostic precision than SampEn, exhibiting an Area Under Curve (AUC) of 0.85 compared to 0.72. A negative correlation between the parameters was found (Spearman; p = - 0.61), with higher SampEn corresponding to lower CPPS. The automatic method used in this study could provide support to voice monitorings in clinic and during individual's daily activities.
2.	Elblaus, Ludvig, 1981- (författare) Crafting New Interfaces for Musical Expression 2015 Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract This thesis collects and contextualizes several projects involving artistically directed prototyping where new artifacts have been developed, in multi-disciplinary groups of practitioners, for use in performance contexts. These projects and their resulting publications have been team efforts, and therefore all papers have more than one author. In the introduction, a complementary perspective to that of the publications is offered, engaging with the characteristics of the digital innards of these artifacts and their digital material qualities. The stance that software source code is a design material is argued, and the notion of the crafting coder is used to view processes that use code as material for artistic creation. Code is also prominently featured in the introductory chapter with examples of some of the central components of the sound processing techniques that have been successfully used in the projects described in this thesis.The artifacts that are described in the thesis are: The Throat, an instrument for augmenting the singing voice using gestural control in real-time, The Vocal Chorder, a string based instrument using full-body interaction that also allows for audience participation through an installation mode, The Charged Room, a video tracking installation that lets users manipulate sound by moving across a stage, and Nebula, a garment that senses the users movements and responds with sound. These artifacts have been evaluated in the context they are designed for, and not only tested in laboratory settings, to make sure that the knowledge produced is valid. Several performances and peda-gogical courses have been used as empirical foundation for the claims of empowerment, expressivity, and performance qualities ascribed to the developed artifacts.
3.	Fahlström Myrman, Arvid, et al. (författare) Partitioning of Posteriorgrams using Siamese Models for Unsupervised Acoustic Modelling 2017 Ingår i: Grounding Language Understanding. Konferensbidrag (refereegranskat)
4.	Kumar Dhaka, Akash, et al. (författare) Sparse Autoencoder Based Semi-Supervised Learning for Phone Classification with Limited Annotations 2017 Ingår i: Grounding Language Understanding. Konferensbidrag (refereegranskat)
5.	Lopes, José, et al. (författare) Detecting Repetitions in Spoken Dialogue Systems Using Phonetic Distances 2015 Ingår i: INTERSPEECH-2015. - 9781510817906 ; , s. 1805-1809 Konferensbidrag (refereegranskat)abstract Repetitions in Spoken Dialogue Systems can be a symptom of problematic communication. Such repetitions are often due to speech recognition errors, which in turn makes it harder to use the output of the speech recognizer to detect repetitions. In this paper, we combine the alignment score obtained using phonetic distances with dialogue-related features to improve repetition detection. To evaluate the method proposed we compare several alignment techniques from edit distance to DTW-based distance, previously used in Spoken-Term detection tasks. We also compare two different methods to compute the phonetic distance: the first one using the phoneme sequence, and the second one using the distance between the phone posterior vectors. Two different datasets were used in this evaluation: a bus-schedule information system (in English) and a call routing system (in Swedish). The results show that approaches using phoneme distances over-perform approaches using Levenshtein distances between ASR outputs for repetition detection.
6.	Salvi, Giampiero (författare) An Analysis of Shallow and Deep Representations of Speech Based on Unsupervised Classification of Isolated Words 2016 Ingår i: Recent Advances in Nonlinear Speech Processing. - Cham : Springer. - 9783319281094 - 9783319281070 ; , s. 151-157 Konferensbidrag (refereegranskat)abstract We analyse the properties of shallow and deep representa-tions of speech. Mel frequency cepstral coefficients (MFCC) are compared to representations learned by a four layer Deep Belief Network (DBN) in terms of discriminative power and invariance to irrelevant factors such as speaker identity or gender. To avoid the influence of supervised statistical modelling, an unsupervised isolated word classification task is used for the comparison. The deep representations are also obtained with unsupervised training (no back-propagation pass is performed). The results show that DBN features provide a more concise clustering and higher match between clusters and word categories in terms of adjusted Rand score. Some of the confusions present with the MFCC features are, however, retained even with the DBN features.
7.	Saponaro, Giovanni, et al. (författare) Interactive Robot Learning of Gestures, Language and Affordances 2017 Ingår i: Grounding Language Understanding. Konferensbidrag (refereegranskat)
8.	Selamtzis, Andreas, 1984-, et al. (författare) Effect of vowel context in cepstral and entropy analysis of pathological voices 2019 Ingår i: Biomedical Signal Processing and Control. - : Elsevier. - 1746-8094 .- 1746-8108. ; 47, s. 350-357 Tidskriftsartikel (refereegranskat)abstract This study investigates the effect of vowel context (excerpted from speech versus sustained) on two voice quality measures: the cepstral peak prominence smoothed (CPPS) and sample entropy (SampEn). Thirty-one dysphonic subjects with different types of organic dysphonia and thirty-one controls read a phonetically balanced text and phonated sustained [a:] vowels in comfortable pitch and loudness. All the [a:] vowels of the read text were excerpted by automatic speech recognition and phonetic (forced) alignment. CPPS and SampEn were calculated for all excerpted vowels of each subject, forming one distribution of CPPS and SampEn values per subject. The sustained vowels were analyzed using a 41 ms window, forming another distribution of CPPS and SampEn values per subject. Two speech-language pathologists performed a perceptual evaluation of the dysphonic subjects’ voice quality from the recorded text. The power of discriminating the dysphonic group from the controls for SampEn and CPPS was assessed for the excerpted and sustained vowels with the Receiver-Operator Characteristic (ROC) analysis. The best discrimination in terms of Area Under Curve (AUC) for CPPS occurred using the mean of the excerpted vowel distributions (AUC=0.85) and for SampEn using the 95th percentile of the sustained vowel distributions (AUC=0.84). CPPS and SampEn were found to be negatively correlated, and the largest correlation was found between the corresponding 95th percentiles of their distributions (Pearson, r=−0.83, p < 10−3). A strong correlation was also found between the 95th percentile of SampEn distributions and the perceptual quality of breathiness (Pearson, r=0.83, p < 10−3). The results suggest that depending on the acoustic voice quality measure, sustained vowels can be more effective than excerpted vowels for detecting dysphonia. Additionally, when using CPPS or SampEn there is an advantage of using the measures’ distributions rather than their average values.
9.	Stefanov, Kalin, et al. (författare) Modeling of Human Visual Attention in Multiparty Open-World Dialogues 2019 Ingår i: ACM Transactions on Human-Robot Interaction. - : ASSOC COMPUTING MACHINERY. - 2573-9522. ; 8:2 Tidskriftsartikel (refereegranskat)abstract This study proposes, develops, and evaluates methods for modeling the eye-gaze direction and head orientation of a person in multiparty open-world dialogues, as a function of low-level communicative signals generated by his/hers interlocutors. These signals include speech activity, eye-gaze direction, and head orientation, all of which can be estimated in real time during the interaction. By utilizing these signals and novel data representations suitable for the task and context, the developed methods can generate plausible candidate gaze targets in real time. The methods are based on Feedforward Neural Networks and Long Short-Term Memory Networks. The proposed methods are developed using several hours of unrestricted interaction data and their performance is compared with a heuristic baseline method. The study offers an extensive evaluation of the proposed methods that investigates the contribution of different predictors to the accurate generation of candidate gaze targets. The results show that the methods can accurately generate candidate gaze targets when the person being modeled is in a listening state. However, when the person being modeled is in a speaking state, the proposed methods yield significantly lower performance.
10.	Stefanov, Kalin, et al. (författare) Vision-based Active Speaker Detection in Multiparty Interaction 2017 Ingår i: Grounding Language Understanding. Konferensbidrag (refereegranskat)

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Resultat 1-10 av 12

Avgränsa träffmängd

Typ av publikation: konferensbidrag (8); tidskriftsartikel (3); licentiatavhandling (1)

Typ av innehåll: refereegranskat (11); övrigt vetenskapligt/konstnärligt (1)

Författare/redaktör: Salvi, Giampiero (11); Stefanov, Kalin (2); Beskow, Jonas (2); Castellana, Antonell ... (2); Carullo, Alessio (2); Astolfi, Arianna (2); visa fler...; Zhang, Cheng (1); Gustafson, Joakim (1); House, David (1); Skantze, Gabriel (1); Elblaus, Ludvig, 198 ... (1); Abad, A (1); Strömbergsson, Sofia (1); Kjellström, Hedvig, ... (1); Selamtzis, Andreas (1); Meena, Raveesh (1); Bresin, Roberto, Pro ... (1); Salvi, Giampiero, As ... (1); Lopes, Jose (1); Fahlström Myrman, Ar ... (1); Kontogiorgos, Dimost ... (1); Batista, F (1); Bernardino, Alexandr ... (1); Kumar Dhaka, Akash (1); Trancoso, I. (1); Saponaro, Giovanni (1); Jamone, Lorenzo (1); Selamtzis, Andreas, ... (1); Oztireli, Cengiz (1); Mandt, Stephan (1); visa färre...

Lärosäte: Kungliga Tekniska Högskolan (12); Karolinska Institutet (1)

Språk: Engelska (12)

Forskningsämne (UKÄ/SCB): Naturvetenskap (6); Teknik (5); Humaniora (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

Copyright © LIBRIS - Nationella bibliotekssystem
LIBRIS.kb.se

pil uppåt

Stäng

Kopiera och spara länken för att återkomma till aktuell vy