SwePub
Sök i SwePub databas

  Extended search

Träfflista för sökning "L773:1687 4714 OR L773:1687 4722 "

Search: L773:1687 4714 OR L773:1687 4722

  • Result 1-10 of 10
Sort/group result
   
EnumerationReferenceCoverFind
1.
  • Borgh, Markus, 1983-, et al. (author)
  • An improved adaptive gain equalizer for noise reduction with low speech distortion
  • 2011
  • In: EURASIP Journal on Audio, Speech, and Music Processing. - : Springer. - 1687-4714 .- 1687-4722. ; 7
  • Journal article (peer-reviewed)abstract
    • In high-quality conferencing systems, it is desired to perform noise reduction with as limited speech distortion as possible. Previous work, based on time varying amplification controlled by signal-to-noise ratio estimation in different frequency subbands, has shown promising results in this regard but can suffer from problems in situations with intense continuous speech. Further, the amount of noise reduction cannot exceed a certain level in order to avoid artifacts. This paper establishes the problems and proposes several improvements. The improved algorithm is evaluated with several different noise characteristics, and the results show that the algorithm provides even less speech distortion, better performance in a multi-speaker environment and improved noise suppression when speech is absent compared with previous work.
  •  
2.
  • Cobos, Maximo, et al. (author)
  • An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction
  • 2022
  • In: Eurasip Journal on Audio, Speech, and Music Processing. - : Springer Science and Business Media LLC. - 1687-4722 .- 1687-4714. ; 2022:10
  • Research review (peer-reviewed)abstract
    • The domain of spatial audio comprises methods for capturing, processing, and reproducing audio content that contains spatial information. Data-based methods are those that operate directly on the spatial information carried by audio signals. This is in contrast to model-based methods, which impose spatial information from, for example, metadata like the intended position of a source onto signals that are otherwise free of spatial information. Signal processing has traditionally been at the core of spatial audio systems, and it continues to play a very important role. The irruption of deep learning in many closely related fields has put the focus on the potential of learning-based approaches for the development of data-based spatial audio applications. This article reviews the most important application domains of data-based spatial audio including well-established methods that employ conventional signal processing while paying special attention to the most recent achievements that make use of machine learning. Our review is organized based on the topology of the spatial audio pipeline that consist in capture, processing/manipulation, and reproduction. The literature on the three stages of the pipeline is discussed, as well as on the spatial audio representations that are used to transmit the content between them, highlighting the key references and elaborating on the underlying concepts. We reflect on the literature based on a juxtaposition of the prerequisites that made machine learning successful in domains other than spatial audio with those that are found in the domain of spatial audio as of today. Based on this, we identify routes that may facilitate future advancement.
  •  
3.
  • Cobos, Maximo, et al. (author)
  • Data-based spatial audio processing
  • 2022
  • In: Eurasip Journal on Audio, Speech, and Music Processing. - : Springer Science and Business Media LLC. - 1687-4722 .- 1687-4714. ; 2022:1
  • Journal article (other academic/artistic)
  •  
4.
  • Dimitrakakis, Christos, 1975, et al. (author)
  • Phoneme and sentence-level ensembles for speech recognition
  • 2011
  • In: Eurasip Journal on Audio, Speech, and Music Processing. - : Springer Science and Business Media LLC. - 1687-4722 .- 1687-4714. ; 2011
  • Journal article (peer-reviewed)abstract
    • We address the question of whether and how boosting and bagging can be used for speech recognition. In order to do this, we compare two different boosting schemes, one at the phoneme level and one at the utterance level, with a phoneme-level bagging scheme. We control for many parameters and other choices, such as the state inference scheme used. In an unbiased experiment, we clearly show that the gain of boosting methods compared to a single hidden Markov model is in all cases only marginal, while bagging significantly outperforms all other methods. We thus conclude that bagging methods, which have so far been overlooked in favour of boosting, should be examined more closely as a potentially useful ensemble learning technique for speech recognition. © 2011 Christos Dimitrakakis and Samy Bengio.
  •  
5.
  • Eyben, Florian, et al. (author)
  • Emotion in the singing voice—a deeper look at acoustic features in the light of automatic classification
  • 2015
  • In: EURASIP Journal on Audio, Speech, and Music Processing. - : Springer Science and Business Media LLC. - 1687-4714 .- 1687-4722.
  • Journal article (peer-reviewed)abstract
    • We investigate the automatic recognition of emotions in the singing voice and study the worth and role of a variety of relevant acoustic parameters. The data set contains phrases and vocalises sung by eight renowned professional opera singers in ten different emotions and a neutral state. The states are mapped to ternary arousal and valence labels. We propose a small set of relevant acoustic features basing on our previous findings on the same data and compare it with a large-scale state-of-the-art feature set for paralinguistics recognition, the baseline feature set of the Interspeech 2013 Computational Paralinguistics ChallengE (ComParE). A feature importance analysis with respect to classification accuracy and correlation of features with the targets is provided in the paper. Results show that the classification performance with both feature sets is similar for arousal, while the ComParE set is superior for valence. Intra singer feature ranking criteria further improve the classification accuracy in a leave-one-singer-out cross validation significantly.
  •  
6.
  • Lindström, Fredric, et al. (author)
  • Efficient Multichannel NLMS Implementation for Acoustic Echo Cancellation
  • 2007
  • In: EURASIP Journal on Audio, Speech, and Music Processing. - : Hindawi Publishing Corporation. - 1687-4714 .- 1687-4722. ; 2007
  • Journal article (peer-reviewed)abstract
    • An acoustic echo cancellation structure with a single loudspeaker and multiple microphones is, from a system identification perspective, generally modelled as a single-input multiple-output system. Such a system thus implies specific echo-path models (adaptive filter) for every loudspeaker to microphone path. Due to the often large dimensionality of the filters, which is required to model rooms with standard reverberation time, the adaptation process can be computationally demanding. This paper presents a selective updating normalized least mean square (NLMS)-based method which reduces complexity to nearly half in practical situations, while showing superior convergence speed performance as compared to conventional complexity reduction schemes. Moreover, the method concentrates the filter adaptation to the filter which is most misadjusted, which is a typically desired feature.
  •  
7.
  • Lostanlen, Vincent, et al. (author)
  • Time-frequency scattering accurately models auditory similarities between instrumental playing techniques
  • 2021
  • In: EURASIP Journal on Audio, Speech, and Music Processing. - : Springer Nature. - 1687-4714 .- 1687-4722. ; 2021:1
  • Journal article (peer-reviewed)abstract
    • Instrumentalplaying techniques such as vibratos, glissandos, and trills often denote musical expressivity, both in classical and folk contexts. However, most existing approaches to music similarity retrieval fail to describe timbre beyond the so-called "ordinary" technique, use instrument identity as a proxy for timbre quality, and do not allow for customization to the perceptual idiosyncrasies of a new subject. In this article, we ask 31 human participants to organize 78 isolated notes into a set of timbre clusters. Analyzing their responses suggests that timbre perception operates within a more flexible taxonomy than those provided by instruments or playing techniques alone. In addition, we propose a machine listening model to recover the cluster graph of auditory similarities across instruments, mutes, and techniques. Our model relies on joint time-frequency scattering features to extract spectrotemporal modulations as acoustic features. Furthermore, it minimizes triplet loss in the cluster graph by means of the large-margin nearest neighbor (LMNN) metric learning algorithm. Over a dataset of 9346 isolated notes, we report a state-of-the-art average precision at rank five (AP@5) of 99.0%+/- 1. An ablation study demonstrates that removing either the joint time-frequency scattering transform or the metric learning algorithm noticeably degrades performance.
  •  
8.
  • Salvi, Giampiero, et al. (author)
  • SynFace-Speech-Driven Facial Animation for Virtual Speech-Reading Support
  • 2009
  • In: Eurasip Journal on Audio, Speech, and Music Processing. - : Springer Science and Business Media LLC. - 1687-4714 .- 1687-4722. ; 2009, s. 191940-
  • Journal article (peer-reviewed)abstract
    • This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling).
  •  
9.
  • Saremi, Amin, et al. (author)
  • An acoustic echo canceller optimized for hands-free speech telecommunication in large vehicle cabins
  • 2023
  • In: EURASIP Journal on Audio, Speech, and Music Processing. - : Springer. - 1687-4714 .- 1687-4722. ; 2023:1
  • Journal article (peer-reviewed)abstract
    • Acoustic echo cancelation (AEC) is a system identification problem that has been addressed by various techniques and most commonly by normalized least mean square (NLMS) adaptive algorithms. However, performing a successful AEC in large commercial vehicles has proved complicated due to the size and challenging variations in the acoustic characteristics of their cabins. Here, we present a wideband fully linear time domain NLMS algorithm for AEC that is enhanced by a statistical double-talk detector (DTD) and a voice activity detector (VAD). The proposed solution was tested in four main Volvo truck models, with various cabin geometries, using standard Swedish hearing-in-noise (HINT) sentences in the presence and absence of engine noise. The results show that the proposed solution achieves a high echo return loss enhancement (ERLE) of at least 25 dB with a fast convergence time, fulfilling ITU G.168 requirements. The presented solution was particularly developed to provide a practical compromise between accuracy and computational cost to allow its real-time implementation on commercial digital signal processors (DSPs). A real-time implementation of the solution was coded in C on an ARM Cortex M-7 DSP. The algorithmic latency was measured at less than 26 ms for processing each 50-ms buffer indicating the computational feasibility of the proposed solution for real-time implementation on common DSPs and embedded systems with limited computational and memory resources. MATLAB source codes and related audio files are made available online for reference and further development.
  •  
10.
  • Sällberg, Benny, et al. (author)
  • On a Method for Improving Impulsive Sounds Localization in Hearing Defenders
  • 2008
  • In: EURASIP Journal on Audio, Speech, and Music Processing. - : Hindawi Publishing Corporation. - 1687-4714 .- 1687-4722. ; 2008, s. 1-7
  • Journal article (peer-reviewed)abstract
    • This paper proposes a new algorithm for a directional aid with hearing defenders. Users of existing hearing defenders experience distorted information, or in worst cases, directional information may not be perceived at all. The users of these hearing defenders may therefore be exposed to serious safety risks. The proposed algorithm improves the directional information for the users of hearing defenders by enhancing impulsive sounds using interaural level difference (ILD). This ILD enhancement is achieved by incorporating a new gain function. Illustrative examples and performance measures are presented to highlight the promising results. By improving the directional information for active hearing defenders, the new method is found to serve as an advanced directional aid.
  •  
Skapa referenser, mejla, bekava och länka
  • Result 1-10 of 10

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view