SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Petkov Petko N.) "

Sökning: WFRF:(Petkov Petko N.)

  • Resultat 1-10 av 12
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Kleijn, W. Bastiaan, et al. (författare)
  • Optimizing Speech Intelligibility in a Noisy Environment
  • 2015
  • Ingår i: IEEE signal processing magazine (Print). - 1053-5888 .- 1558-0792. ; 32:2, s. 43-54
  • Tidskriftsartikel (refereegranskat)abstract
    • Modern communication technology facilitates communication from anywhere to anywhere. As a result, low speech intelligibility has become a common problem, which is exacerbated by the lack of feedback to the talker about the rendering environment. In recent years, a range of algorithms has been developed to enhance the intelligibility of speech rendered in a noisy environment. We describe methods for intelligibility enhancement from a unified vantage point. Before one defines a measure of intelligibility, the level of abstraction of the representation must be selected. For example, intelligibility can be measured on the message, the sequence of words spoken, the sequence of sounds, or a sequence of states of the auditory system. Natural measures of intelligibility defined at the message level are mutual information and the hit-or-miss criterion. The direct evaluation of high-level measures requires quantitative knowledge of human cognitive processing. Lower-level measures can be derived from higher-level measures by making restrictive assumptions. We discuss the implementation and performance of some specific enhancement systems in detail, including speech intelligibility index (SII)-based systems and systems aimed at enhancing the sound-field where it is perceived by the listener. We conclude with a discussion of the current state of the field and open problems.
  •  
2.
  • Mossavat, Iman, et al. (författare)
  • A Hierarchical Bayesian Approach to Modeling Heterogeneity in Speech Quality Assessment
  • 2012
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - 1558-7916 .- 1558-7924. ; 20:1, s. 136-146
  • Tidskriftsartikel (refereegranskat)abstract
    • The development of objective speech quality measures generally involves fitting a model to subjective rating data. A typical data set comprises ratings generated by listening tests performed in different languages and across different laboratories. These factors as well as others, such as the sex and age of the talker, influence the subjective ratings and result in data heterogeneity. We use a linear hierarchical Bayes (HB) structure to account for heterogeneity. To make the structure effective, we develop a variational Bayesian inference for the linear HB structure that approximates not only the posterior over the model parameters, but also the model evidence. Using the approximate model evidence we are able to study and exploit the heterogeneity inducing factors in the Bayesian framework. The new approach yields a simple linear predictor with state-of-the-art predictive performance. Our experiments show that the new method compares favorably with systems based on more complex predictor structures such as ITU-T recommendation P.563, Bayesian MARS, and Gaussian processes.
  •  
3.
  • Petkov, Petko N., et al. (författare)
  • A Bayesian Approach to Non-Intrusive Quality Assessment of Speech
  • 2009
  • Ingår i: INTERSPEECH 2009. - BAIXAS : ISCA-INST SPEECH COMMUNICATION ASSOC. ; , s. 2875-2878
  • Konferensbidrag (refereegranskat)abstract
    • A Bayesian approach to non-intrusive quality assessment of narrow-band speech is presented. The speech features used to assess quality are the sample mean and variance of band-powers evaluated from the temporal envelope in the channels of an auditory filter-bank. Bayesian multivariate adaptive regression splines (BMARS) is used to map features into quality ratings. The proposed combination of features and regression method leads to a high performance quality assessment algorithm that learns efficiently from a small amount of training data and avoids overfitting. Use of the Bayesian approach also allows the derivation of credible intervals on the model predictions, which provide a quantitative measure of model confidence and can be used to identify the need for complementing the training databases.
  •  
4.
  • Petkov, Petko N., et al. (författare)
  • Discrete Choice Models for Non-Intrusive Quality Assessment
  • 2011
  • Ingår i: 12th Annual Conference Of The International Speech Communication Association 2011 (INTERSPEECH 2011), Vols 1-5. - : ISCA. - 9781618392701 ; , s. 200-203
  • Konferensbidrag (refereegranskat)abstract
    • Non-intrusive signal quality assessment in general, and its application to speech signal processing, in particular, builds extensively upon statistical regression models. Commonly, the raw preference scores used for fitting these models belong to a categorical scale. Averaging the scores over a number of test subjects results in smooth, close-to-continuous ratings, thus justifying the use of regression as opposed to classification models. A form of marginalization, averaging subjective ratings takes away useful information about the reliability of individual test points. Using a model tailored to the raw data achieves highly competitive performance in terms of conventional performance measures while providing the additional advantage of identifying the usability of individual test points. In this paper, we consider the application of discrete choice models to non-intrusive quality assessment of speech.
  •  
5.
  • Petkov, Petko N., et al. (författare)
  • Enhancing Subjective Speech Intelligibility Using a Statistical Model of Speech
  • 2012
  • Ingår i: 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 1. - 9781622767595 ; , s. 166-169
  • Konferensbidrag (refereegranskat)abstract
    • The intelligibility of speech in adverse noise conditions can be improved by modifying the characteristics of the clean speech prior to its presentation. An effective and flexible paradigm is to select the modification by optimizing a measure of objective intelligibility. Here we apply this paradigm at the text level and optimize a measure related to the classification error probability in an automatic speech recognition system. The proposed method was applied to a simple but powerful band-energy modification mechanism under an energy preservation constraint. Subjective evaluation results provide a clear indication of a significant gain in subjective intelligibility. In contrast to existing methods, the proposed approach is not restricted to a particular modification strategy and treats the notion of optimality at a level closer to that of subjective intelligibility. The computational complexity of the method is sufficiently low to enable its use in on-line applications.
  •  
6.
  • Petkov, Petko N., et al. (författare)
  • Feature set augmentation for enhancing the performance of a non-intrusive quality predictor
  • 2012
  • Ingår i: 2012 4th International Workshop on Quality of Multimedia Experience, QoMEX 2012. - : IEEE. - 9781467307253 ; , s. 121-126
  • Konferensbidrag (refereegranskat)abstract
    • A non-intrusive quality predictor constitutes a mapping from signal features to a (typically one dimensional) representation of the perceived quality. Assuming that the regression model performing the mapping is suited to the data, the performance of the predictor largely depends on how well the parameters of this regression model can be inferred from the training data. In situations where the training data is scarce, model performance is degraded due to over-fitting. The effects of over-fitting can be mitigated by feature selection but the model performance remains low due to the insufficiently representative training data. The objective we pursue is to enhance the performance of a quality predictor by augmenting the feature set with the output of a pre-trained quality predictor. This approach introduces an implicit dependence of the regression model parameters on a larger amount of training data. In view of the increasing usage of speech signals with higher bandwidth, and the dearth of training data for such signals, an augmentation of particular interest is that of a wide-band feature set with a narrow-band quality prediction. Experimental results for additive noise and non-linear distortions encountered in hearing aids, using quality labels from an intrusive quality predictor, illustrate the performance enhancement capabilities of the proposed approach.
  •  
7.
  • Petkov, Petko N., et al. (författare)
  • Improving the Phase Vocoder Approach to Pitch-Shifting
  • 2007
  • Ingår i: INTERSPEECH 2007. ; , s. 2760-2763
  • Konferensbidrag (refereegranskat)abstract
    • A class of methods known as phase vocoders allows for implementing pitch shifting in the spectral domain. We extend the approach of shifting the isolated harmonies of the spectrum by introducing a new technique for separating the sinusoidal components. Keeping together the main lobe and the side lobes, which result from convolution of the harmonics with the spectrum of the analysis window in the Fourier transform, we minimize the leakage of energy and the related phase compensation problems. Furthermore, we integrate a robust enhancement to the update of the phase, based on tracking of the energy envelope. The formant structure of the signal is preserved by means of an all-pole speech production model. The proposed modifications lead to significant improvement of the quality of the pitch-shifted speech.
  •  
8.
  • Petkov, Petko N., 1980-, et al. (författare)
  • Maximizing Phoneme Recognition Accuracy for Enhanced Speech Intelligibility in Noise
  • 2013
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - 1558-7916 .- 1558-7924. ; 21:5, s. 1035-1045
  • Tidskriftsartikel (refereegranskat)abstract
    • An effective measure of speech intelligibility is the probability of correct recognition of the transmitted message. We propose a speech pre-enhancement method based on matching the recognized text to the text of the original message. The selected criterion is accurately approximated by the probability of the correct transcription given an estimate of the noisy speech features. In the presence of environment noise, and with a decrease in the signal-to-noise ratio, speech intelligibility declines. We implement a speech pre-enhancement system that optimizes the proposed criterion for the parameters of two distinct speech modification strategies under an energy-preservation constraint. The proposed method requires prior knowledge in the form of a transcription of the transmitted message and acoustic speech models from an automatic speech recognition system. Performance results from an open-set subjective intelligibility test indicate a significant improvement over natural speech and a reference system that optimizes a perceptual-distortion-based objective intelligibility measure. The computational complexity of the approach permits use in on-line applications.
  •  
9.
  • Petkov, Petko N., et al. (författare)
  • OBJECTIVE QUALITY ESTIMATION OF WIDE-BAND SPEECH USING A NARROW-BAND PRIOR
  • 2010
  • Ingår i: 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. - 9781424442966 ; , s. 4670-4673
  • Konferensbidrag (refereegranskat)abstract
    • A fundamental challenge in the design of objective models for estimation of speech signal quality lies in the shortage of subjectively labelled databases. This problem is particularly relevant when developing quality assessment models for wide-band (16 kHz sampling rate) signals where databases are scarce. We explore the possibility for seamlessly integrating a quality prior in the form of a narrow-band quality estimate into the framework of a non-intrusive wide-band quality assessment algorithm. Experimental results confirm that the proposed approach can be used to improve performance over a baseline wide-band system without a narrow-band prior.
  •  
10.
  • Petkov, Petko N., 1980-, et al. (författare)
  • Preservation of Speech Spectral Dynamics Enhances Intelligibility
  • 2013
  • Ingår i: Proc. Interspeech, 2013. ; , s. 3564-3568
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • We propose a method for the enhancement of intelligibility in scenarios where speech is rendered in a noisy environment. The method is based on the hypothesis that intelligibility is a monotonic function of the degree of preservation of the speech spectral dynamics. The accuracy of the speech spectral dynamics can then be traded against the power of the rendered speech signal. We can either maximize the dynamics accuracy given the signal power, or minimize the signal power given the dynamics accuracy. In our implementation, the spectral dynamics is quantified as the difference of the mel cepstra between time frames of the speech signal. We compared the speech rendered by our implementation against both natural speech and a reference method, for the scenario where signal power is minimized given a target dynamics accuracy, and observed a significantly improved intelligibility. The low system delay, and the low complexity and memory requirements make the new method particularly suitable for real-time applications.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 12

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy