SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:1558 7916 OR L773:1558 7924 srt2:(2010-2014)"

Sökning: L773:1558 7916 OR L773:1558 7924 > (2010-2014)

  • Resultat 1-10 av 22
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Ananthakrishnan, Gopal, et al. (författare)
  • Exploring the Predictability of Non-Unique Acoustic-to-Articulatory Mappings
  • 2012
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : Institute of Electrical and Electronics Engineers (IEEE). - 1558-7916 .- 1558-7924. ; 20:10, s. 2672-2682
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper explores statistical tools that help analyze the predictability in the acoustic-to-articulatory inversion of speech, using an Electromagnetic Articulography database of simultaneously recorded acoustic and articulatory data. Since it has been shown that speech acoustics can be mapped to non-unique articulatory modes, the variance of the articulatory parameters is not sufficient to understand the predictability of the inverse mapping. We, therefore, estimate an upper bound to the conditional entropy of the articulatory distribution. This provides a probabilistic estimate of the range of articulatory values (either over a continuum or over discrete non-unique regions) for a given acoustic vector in the database. The analysis is performed for different British/Scottish English consonants with respect to which articulators (lips, jaws or the tongue) are important for producing the phoneme. The paper shows that acoustic-articulatory mappings for the important articulators have a low upper bound on the entropy, but can still have discrete non-unique configurations.
  •  
2.
  • Barkefors, Annea, et al. (författare)
  • Design and Analysis of Linear Quadratic Gaussian Feedforward Controllers for Active Noise Control
  • 2014
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - 1558-7916 .- 1558-7924. ; 22:12, s. 1777-1791
  • Tidskriftsartikel (refereegranskat)abstract
    • A method for sound field control applied to active noise control is presented and evaluated. The method uses Linear Quadratic Gaussian (LQG) feedforward control to find a Minimal Mean Square Error (MMSE)-optimal linear sound field controller under a causality constraint. It is obtained by solving a polynomial matrix spectral factorization and a linear (Diophantine) polynomial matrix equation. An important component in the design is the control signal penalty term of the criterion. Its use and influence is here discussed and evaluated using measured room impulse responses. The results indicate that the use of a relatively simple, frequency-weighted penalty on individual control signals provides most of the benefits obtainable by the considered more advanced alternative. We also introduce and illustrate several tools for performance analysis. An analytical expression for the attainable performance clearly reveals the performance loss generated by having to use a causal controller instead of the ideal noncausal controller. This loss is largest at low frequencies. Furthermore, we introduce a measure of the reproducibility of the target noise sound field with given control loudspeaker setups and room transfer functions. It describes how well a controller that uses an input subspace of dimension equal to the effective rank of the system is able to reproduce a target sound field. This performance measure can e.g. be used to support the selection of good combinations of placements of control loudspeakers.
  •  
3.
  • Berggren, Magnus, et al. (författare)
  • Low-complexity network echo cancellation approach for systems equipped with external memory
  • 2011
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : IEEE. - 1558-7916 .- 1558-7924. ; 19:8, s. 2506-2515
  • Tidskriftsartikel (refereegranskat)abstract
    • Long delays and sparseness characterize impulse responses in telecommunication networks and a vast number of solutions for network echo cancellation have been proposed over the years. In this paper, an approach for detecting dispersive regions of a sparse impulse response and a proportionate normalized least mean square (PNLMS)-based selective updating approach are combined with an adaptive double-talk detector to form a complete solution for echo cancellation. The proposed solution has low computational complexity and is targeted for systems equipped with external memory.
  •  
4.
  • Brännmark, Lars-Johan, et al. (författare)
  • Compensation of Loudspeaker-Room Responses in a Robust MIMO Control Framework
  • 2013
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - 1558-7916 .- 1558-7924. ; 21:6, s. 1201-1216
  • Tidskriftsartikel (refereegranskat)abstract
    • A new multichannel approach to robust broadband loudspeaker-room equalization is presented. Traditionally, the equalization (or room correction) problem has been treated primarily by single-channel methods, where loudspeaker input signals are prefiltered individually by separate scalar filters. Single-channel methods are generally able to improve the average spectral flatness of the acoustic transfer functions in a listening region, but they cannot reduce the variability of the transfer functions within the region. Most modern audio reproduction systems, however, contain two or more loudspeakers, and in this paper we aim at improving the equalization performance by using all available loudspeakers jointly. To this end we propose a polynomial based MIMO formulation of the equalization problem. The new approach, which is a generalization of an earlier single-channel approach by the authors, is found to reduce the average reproduction error and the transfer function variability over a region in space. Moreover, pre-ringing artifacts are avoided, and the reproduction error below 1000 Hz is significantly reduced with an amount that scales with the number of loudspeakers used.
  •  
5.
  • Chatterjee, Saikat, et al. (författare)
  • Auditory Model-Based Design and Optimization of Feature Vectors for Automatic Speech Recognition
  • 2011
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - 1558-7916 .- 1558-7924. ; 19:6, s. 1813-1825
  • Tidskriftsartikel (refereegranskat)abstract
    • Using spectral and spectro-temporal auditory models along with perturbation-based analysis, we develop a new framework to optimize a feature vector such that it emulates the behavior of the human auditory system. The optimization is carried out in an offline manner based on the conjecture that the local geometries of the feature vector domain and the perceptual auditory domain should be similar. Using this principle along with a static spectral auditory model, we modify and optimize the static spectral mel frequency cepstral coefficients (MFCCs) without considering any feedback from the speech recognition system. We then extend the work to include spectro-temporal auditory properties into designing a new dynamic spectro-temporal feature vector. Using a spectro-temporal auditory model, we design and optimize the dynamic feature vector to incorporate the behavior of human auditory response across time and frequency. We show that a significant improvement in automatic speech recognition (ASR) performance is obtained for any environmental condition, clean as well as noisy.
  •  
6.
  • Hendriks, Richard C., et al. (författare)
  • Noise Correlation Matrix Estimation for Multi-Microphone Speech Enhancement
  • 2012
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - 1558-7916 .- 1558-7924. ; 20:1, s. 223-233
  • Tidskriftsartikel (refereegranskat)abstract
    • For multi-channel noise reduction algorithms like the minimum variance distortionless response (MVDR) beamformer, or the multi-channel Wiener filter, an estimate of the noise correlation matrix is needed. For its estimation, it is often proposed in the literature to use a voice activity detector (VAD). However, using a VAD the estimated matrix can only be updated in speech absence. As a result, during speech presence the noise correlation matrix estimate does not follow changing noise fields with an appropriate accuracy. This effect is further increased, as in nonstationary noise voice activity detection is a rather difficult task, and false-alarms are likely to occur. In this paper, we present and analyze an algorithm that estimates the noise correlation matrix without using a VAD. This algorithm is based on measuring the correlation of the noisy input and a noise reference which can be obtained, e. g., by steering a null towards the target source. When applied in combination with an MVDR beamformer, it is shown that the proposed noise correlation matrix estimate results in a more accurate beamformer response, a larger signal-to-noise ratio improvement and a larger instrumentally predicted speech intelligibility when compared to competing algorithms such as the generalized sidelobe canceler, a VAD-based MVDR beamformer, and an MVDR based on the noisy correlation matrix.
  •  
7.
  • Holzapfel, André, 1976-, et al. (författare)
  • Scale transform in rhythmic similarity of music
  • 2011
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : IEEE Press. - 1558-7916 .- 1558-7924. ; 19:1, s. 176-185
  • Tidskriftsartikel (refereegranskat)abstract
    • As a special case of the Mellin transform, the scale transform has been applied in various signal processing areas, in order to get a signal description that is invariant to scale changes. In this paper, the scale transform is applied to autocorrelation sequences derived from music signals. It is shown that two such sequences, when derived from similar rhythms with different tempo, differ mainly by a scaling factor. By using the scale transform, the proposed descriptors are robust to tempo changes, and are specially suited for the comparison of pieces with different tempi but similar rhythm. As music with such characteristics is widely encountered in traditional forms of music, the performance of the descriptors in a classification task of Greek traditional dances and Turkish traditional songs is evaluated. On these datasets accuracies compared to non-tempo robust approaches are improved by more than 20%, while on a dataset of Western music the achieved accuracy improves compared to previously presented results.
  •  
8.
  • Holzapfel, André, 1976-, et al. (författare)
  • Selective sampling for beat tracking evaluation
  • 2012
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : IEEE Press. - 1558-7916 .- 1558-7924. ; 20:9, s. 2539-2548
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, we propose a method that can identify challenging music samples for beat tracking without ground truth. Our method, motivated by the machine learning method "selective sampling," is based on the measurement of mutual agreement between beat sequences. In calculating this mutual agreement we show the critical influence of different evaluation measures. Using our approach we demonstrate how to compile a new evaluation dataset comprised of difficult excerpts for beat tracking and examine this difficulty in the context of perceptual and musical properties. Based on tag analysis we indicate the musical properties where future advances in beat tracking research would be most profitable and where beat tracking is too difficult to be attempted. Finally, we demonstrate how our mutual agreement method can be used to improve beat tracking accuracy on large music collections.
  •  
9.
  • Holzapfel, André, 1976-, et al. (författare)
  • Three dimensions of pitched instrument onset detection
  • 2010
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : IEEE Press. - 1558-7916 .- 1558-7924. ; 18:6, s. 1517-1527
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, we suggest a novel group delay based method for the onset detection of pitched instruments. It is proposed to approach the problem of onset detection by examining three dimensions separately: phase (i.e., group delay), magnitude and pitch. The evaluation of the suggested onset detectors for phase, pitch and magnitude is performed using a new publicly available and fully onset annotated database of monophonic recordings which is balanced in terms of included instruments and onset samples per instrument, while it contains different performance styles. Results show that the accuracy of onset detection depends on the type of instruments as well as on the style of performance. Combining the information contained in the three dimensions by means of a fusion at decision level leads to an improvement of onset detection by about 8% in terms of F-measure, compared to the best single dimension.
  •  
10.
  • Ma, Zhanyu, 1982-, et al. (författare)
  • Vector Quantization of LSF Parameters With a Mixture of Dirichlet Distributions
  • 2013
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - 1558-7916 .- 1558-7924. ; 21:9, s. 1777-1790
  • Tidskriftsartikel (refereegranskat)abstract
    • Quantization of the linear predictive coding parameters is an important part in speech coding. Probability density function (PDF)-optimized vector quantization (VQ) has been previously shown to be more efficient than VQ based only on training data. For data with bounded support, some well-defined bounded-support distributions (e.g., the Dirichlet distribution) have been proven to outperform the conventional Gaussian mixture model (GMM), with the same number of free parameters required to describe the model. When exploiting both the boundary and the order properties of the line spectral frequency (LSF) parameters, the distribution of LSF differences (Delta LSF) can be modelled with a Dirichlet mixture model (DMM). We propose a corresponding DMM based VQ. The elements in a Dirichlet vector variable are highly mutually correlated. Motivated by the Dirichlet vector variable's neutrality property, a practical non-linear transformation scheme for the Dirichlet vector variable can be obtained. Similar to the Karhunen-Loeve transform for Gaussian variables, this non-linear transformation decomposes the Dirichlet vector variable into a set of independent beta-distributed variables. Using high rate quantization theory and by the entropy constraint, the optimal inter-and intra-component bit allocation strategies are proposed. In the implementation of scalar quantizers, we use the constrained-resolution coding to approximate the derived constrained-entropy coding. A practical coding scheme for DVQ is designed for the purpose of reducing the quantization error accumulation. The theoretical and practical quantization performance of DVQ is evaluated. Compared to the state-of-the-art GMM-based VQ and recently proposed beta mixture model (BMM) based VQ, DVQ performs better, with even fewer free parameters and lower computational cost.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 22

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy