SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:1558 7916 OR L773:1558 7924 "

Sökning: L773:1558 7916 OR L773:1558 7924

  • Resultat 1-10 av 43
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ananthakrishnan, Gopal, et al. (författare)
  • Exploring the Predictability of Non-Unique Acoustic-to-Articulatory Mappings
  • 2012
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : Institute of Electrical and Electronics Engineers (IEEE). - 1558-7916 .- 1558-7924. ; 20:10, s. 2672-2682
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper explores statistical tools that help analyze the predictability in the acoustic-to-articulatory inversion of speech, using an Electromagnetic Articulography database of simultaneously recorded acoustic and articulatory data. Since it has been shown that speech acoustics can be mapped to non-unique articulatory modes, the variance of the articulatory parameters is not sufficient to understand the predictability of the inverse mapping. We, therefore, estimate an upper bound to the conditional entropy of the articulatory distribution. This provides a probabilistic estimate of the range of articulatory values (either over a continuum or over discrete non-unique regions) for a given acoustic vector in the database. The analysis is performed for different British/Scottish English consonants with respect to which articulators (lips, jaws or the tongue) are important for producing the phoneme. The paper shows that acoustic-articulatory mappings for the important articulators have a low upper bound on the entropy, but can still have discrete non-unique configurations.
  •  
2.
  • Arnela, Marc, et al. (författare)
  • MRI-based vocal tract representations for the three-dimensional finite element synthesis of diphthongs
  • 2019
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : IEEE Press. - 1558-7916 .- 1558-7924. ; 27:12, s. 2173-2182
  • Tidskriftsartikel (refereegranskat)abstract
    • The synthesis of diphthongs in three-dimensions (3D) involves the simulation of acoustic waves propagating through a complex 3D vocal tract geometry that deforms over time. Accurate 3D vocal tract geometries can be extracted from Magnetic Resonance Imaging (MRI), but due to long acquisition times, only static sounds can be currently studied with an adequate spatial resolution. In this work, 3D dynamic vocal tract representations are built to generate diphthongs, based on a set of cross-sections extracted from MRI-based vocal tract geometries of static vowel sounds. A diphthong can then be easily generated by interpolating the location, orientation and shape of these cross-sections, thus avoiding the interpolation of full 3D geometries. Two options are explored to extract the cross-sections. The first one is based on an adaptive grid (AG), which extracts the cross-sections perpendicular to the vocal tract midline, whereas the second one resorts to a semi-polar grid (SPG) strategy, which fixes the cross-section orientations. The finite element method (FEM) has been used to solve the mixed wave equation and synthesize diphthongs [${\alpha i}$] and [${\alpha u}$] in the dynamic 3D vocal tracts. The outputs from a 1D acoustic model based on the Transfer Matrix Method have also been included for comparison. The results show that the SPG and AG provide very close solutions in 3D, whereas significant differences are observed when using them in 1D. The SPG dynamic vocal tract representation is recommended for 3D simulations because it helps to prevent the collision of adjacent cross-sections.
  •  
3.
  • Bahne, Adrian, et al. (författare)
  • Optimizing the similarity of loudspeaker : Room responses in multiple listening positions
  • 2016
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - 1558-7916 .- 1558-7924. ; 24:2, s. 340-353
  • Tidskriftsartikel (refereegranskat)abstract
    • A shortcoming of multichannel sound reproduction standards, such as stereo or 5.1 surround, is their incompatibility with multiple off-axis listening positions. Accurate reproduction of virtual sound sources can only be experienced in the sweet spot, which is located equidistant to the loudspeakers. We here present a novel methodology to compensate audio systems such that the channel similarity is optimized in several listening positions simultaneously. To that end we propose a novel MIMO personal audio filter design framework based on feed-forward control. By proper design choices, filters that successfully compensate for multiple offaxis positions and irregularities in the frequency sum responses are obtained. The design choices include allpass filters with appropriate phase shifts as target for each listening position in addition to a weighted similarity requirement. Evaluations based on measurements of two four-channel car audio systems show that the proposed method significantly improves timbral sound reproduction and phantom center reproduction in several listening positions simultaneously.
  •  
4.
  • Barkefors, Annea, et al. (författare)
  • Design and Analysis of Linear Quadratic Gaussian Feedforward Controllers for Active Noise Control
  • 2014
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - 1558-7916 .- 1558-7924. ; 22:12, s. 1777-1791
  • Tidskriftsartikel (refereegranskat)abstract
    • A method for sound field control applied to active noise control is presented and evaluated. The method uses Linear Quadratic Gaussian (LQG) feedforward control to find a Minimal Mean Square Error (MMSE)-optimal linear sound field controller under a causality constraint. It is obtained by solving a polynomial matrix spectral factorization and a linear (Diophantine) polynomial matrix equation. An important component in the design is the control signal penalty term of the criterion. Its use and influence is here discussed and evaluated using measured room impulse responses. The results indicate that the use of a relatively simple, frequency-weighted penalty on individual control signals provides most of the benefits obtainable by the considered more advanced alternative. We also introduce and illustrate several tools for performance analysis. An analytical expression for the attainable performance clearly reveals the performance loss generated by having to use a causal controller instead of the ideal noncausal controller. This loss is largest at low frequencies. Furthermore, we introduce a measure of the reproducibility of the target noise sound field with given control loudspeaker setups and room transfer functions. It describes how well a controller that uses an input subspace of dimension equal to the effective rank of the system is able to reproduce a target sound field. This performance measure can e.g. be used to support the selection of good combinations of placements of control loudspeakers.
  •  
5.
  • Berggren, Magnus, et al. (författare)
  • Low-complexity network echo cancellation approach for systems equipped with external memory
  • 2011
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : IEEE. - 1558-7916 .- 1558-7924. ; 19:8, s. 2506-2515
  • Tidskriftsartikel (refereegranskat)abstract
    • Long delays and sparseness characterize impulse responses in telecommunication networks and a vast number of solutions for network echo cancellation have been proposed over the years. In this paper, an approach for detecting dispersive regions of a sparse impulse response and a proportionate normalized least mean square (PNLMS)-based selective updating approach are combined with an adaptive double-talk detector to form a complete solution for echo cancellation. The proposed solution has low computational complexity and is targeted for systems equipped with external memory.
  •  
6.
  • Brännmark, Lars-Johan, et al. (författare)
  • Compensation of Loudspeaker-Room Responses in a Robust MIMO Control Framework
  • 2013
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - 1558-7916 .- 1558-7924. ; 21:6, s. 1201-1216
  • Tidskriftsartikel (refereegranskat)abstract
    • A new multichannel approach to robust broadband loudspeaker-room equalization is presented. Traditionally, the equalization (or room correction) problem has been treated primarily by single-channel methods, where loudspeaker input signals are prefiltered individually by separate scalar filters. Single-channel methods are generally able to improve the average spectral flatness of the acoustic transfer functions in a listening region, but they cannot reduce the variability of the transfer functions within the region. Most modern audio reproduction systems, however, contain two or more loudspeakers, and in this paper we aim at improving the equalization performance by using all available loudspeakers jointly. To this end we propose a polynomial based MIMO formulation of the equalization problem. The new approach, which is a generalization of an earlier single-channel approach by the authors, is found to reduce the average reproduction error and the transfer function variability over a region in space. Moreover, pre-ringing artifacts are avoided, and the reproduction error below 1000 Hz is significantly reduced with an amount that scales with the number of loudspeakers used.
  •  
7.
  • Chatterjee, Saikat, et al. (författare)
  • Auditory Model-Based Design and Optimization of Feature Vectors for Automatic Speech Recognition
  • 2011
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - 1558-7916 .- 1558-7924. ; 19:6, s. 1813-1825
  • Tidskriftsartikel (refereegranskat)abstract
    • Using spectral and spectro-temporal auditory models along with perturbation-based analysis, we develop a new framework to optimize a feature vector such that it emulates the behavior of the human auditory system. The optimization is carried out in an offline manner based on the conjecture that the local geometries of the feature vector domain and the perceptual auditory domain should be similar. Using this principle along with a static spectral auditory model, we modify and optimize the static spectral mel frequency cepstral coefficients (MFCCs) without considering any feedback from the speech recognition system. We then extend the work to include spectro-temporal auditory properties into designing a new dynamic spectro-temporal feature vector. Using a spectro-temporal auditory model, we design and optimize the dynamic feature vector to incorporate the behavior of human auditory response across time and frequency. We show that a significant improvement in automatic speech recognition (ASR) performance is obtained for any environmental condition, clean as well as noisy.
  •  
8.
  • Gustafsson, Harald, et al. (författare)
  • Low-complexity feature-mapped speech bandwidth extension
  • 2006
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - PISCATAWAY : IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. - 1558-7916 .- 1558-7924. ; , s. 577-588
  • Tidskriftsartikel (refereegranskat)abstract
    • Today's telecommunications systems use a limited audio signal bandwidth. A typical bandwidth is 0.3-3.4 kHz, but recently it has been suggested that mobile phone networks will facilitate an audio signal bandwidth of 50 Hz-7 kHz. This is suggested since an increased bandwidth will increase the sound quality of the speech signals. Since only few telephones initially will have this facility, a method extending the conventional narrow frequency-band speech signal into a wide-band speech signal utilizing the receiving telephone only is suggested. This will give the impression of a wide-band speech signal. The proposed speech bandwidth extension method is based on models of speech acoustics and fundamentals of human hearing. The extension maps each speech feature separately. Care has been taken to deal with implementation aspects, such as noisy speech signals, speech signal delays, computational complexity, and processing memory usage.
  •  
9.
  • Hendriks, Richard C., et al. (författare)
  • Noise Correlation Matrix Estimation for Multi-Microphone Speech Enhancement
  • 2012
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - 1558-7916 .- 1558-7924. ; 20:1, s. 223-233
  • Tidskriftsartikel (refereegranskat)abstract
    • For multi-channel noise reduction algorithms like the minimum variance distortionless response (MVDR) beamformer, or the multi-channel Wiener filter, an estimate of the noise correlation matrix is needed. For its estimation, it is often proposed in the literature to use a voice activity detector (VAD). However, using a VAD the estimated matrix can only be updated in speech absence. As a result, during speech presence the noise correlation matrix estimate does not follow changing noise fields with an appropriate accuracy. This effect is further increased, as in nonstationary noise voice activity detection is a rather difficult task, and false-alarms are likely to occur. In this paper, we present and analyze an algorithm that estimates the noise correlation matrix without using a VAD. This algorithm is based on measuring the correlation of the noisy input and a noise reference which can be obtained, e. g., by steering a null towards the target source. When applied in combination with an MVDR beamformer, it is shown that the proposed noise correlation matrix estimate results in a more accurate beamformer response, a larger signal-to-noise ratio improvement and a larger instrumentally predicted speech intelligibility when compared to competing algorithms such as the generalized sidelobe canceler, a VAD-based MVDR beamformer, and an MVDR based on the noisy correlation matrix.
  •  
10.
  • Holzapfel, André, 1976-, et al. (författare)
  • Musical genre classification using Nonnegative Matrix Factorization based features
  • 2008
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : IEEE Press. - 1558-7916 .- 1558-7924. ; 16:2, s. 424-434
  • Tidskriftsartikel (refereegranskat)abstract
    • Nonnegative matrix factorization (NMF) is used to derive a novel description for the timbre of musical sounds. Using NMF, a spectrogram is factorized providing a characteristic spectral basis. Assuming a set of spectrograms given a musical genre, the space spanned by the vectors of the obtained spectral bases is modeled statistically using mixtures of Gaussians, resulting in a description of the spectral base for this musical genre. This description is shown to improve classification results by up to 23.3% compared to MFCC-based models, while the compression performed by the factorization decreases training time significantly. Using a distance-based stability measure this compression is shown to reduce the noise present in the data set resulting in more stable classification models. In addition, we compare the mean squared errors of the approximation to a spectrogram using independent component analysis and nonnegative matrix factorization, showing the superiority of the latter approach.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 43
Typ av publikation
tidskriftsartikel (43)
Typ av innehåll
refereegranskat (43)
Författare/redaktör
Kleijn, W. Bastiaan (10)
Claesson, Ingvar (6)
Kleijn, Bastiaan (4)
Holzapfel, André, 19 ... (4)
Grancharov, Volodya (4)
Lindström, Fredric (3)
visa fler...
Evangelista, Gianpao ... (3)
Stylianou, Yannis (3)
Schüldt, Christian (3)
Ekman, Anders (2)
Ahlén, Anders (2)
Engwall, Olov (2)
Bahne, Adrian (2)
Brännmark, Lars-Joha ... (2)
Grbic, Nedelko (2)
Lee, C. H. (1)
Berggren, Magnus (1)
Maguire Jr., Gerald ... (1)
Sternad, Mikael (1)
Sandsten, Maria (1)
Pernow, John (1)
Nilsson, Mattias (1)
Carlström, Mattias (1)
Ananthakrishnan, Gop ... (1)
Raspaud, Martin (1)
Lindblom, Jonas (1)
Chatterjee, Saikat (1)
Neiberg, Daniel (1)
Jensen, J. (1)
Arnela, Marc (1)
Dabbaghchian, Saeed (1)
Guasch, Oriol (1)
Weitzberg, Eddie (1)
Barkefors, Annea (1)
Sällberg, Benny (1)
Bresin, Roberto (1)
Borgh, Markus, 1983- (1)
Leijon, Arne (1)
Bozkurt, Baris (1)
Mancini, Maurizio (1)
Catrina, Sergiu-Bogd ... (1)
Lundberg, Jon O. (1)
Amft, Oliver (1)
Yermeche, Zohra (1)
Gustafsson, Harald (1)
Eckerholm, Fredrik (1)
Kubin, Gernot (1)
Kang, H.G. (1)
Sundqvist, Michaela ... (1)
Mahdi, Ali (1)
visa färre...
Lärosäte
Kungliga Tekniska Högskolan (28)
Blekinge Tekniska Högskola (6)
Uppsala universitet (3)
Linköpings universitet (3)
Lunds universitet (3)
Chalmers tekniska högskola (1)
visa fler...
Gymnastik- och idrottshögskolan (1)
Karolinska Institutet (1)
visa färre...
Språk
Engelska (43)
Forskningsämne (UKÄ/SCB)
Teknik (25)
Naturvetenskap (8)
Medicin och hälsovetenskap (1)
Samhällsvetenskap (1)
Humaniora (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy