SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "L773:1558 7916 OR L773:1558 7924 srt2:(2005-2009)"

Sökning: L773:1558 7916 OR L773:1558 7924 > (2005-2009)

  • Resultat 1-10 av 17
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Gustafsson, Harald, et al. (författare)
  • Low-complexity feature-mapped speech bandwidth extension
  • 2006
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - PISCATAWAY : IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC. - 1558-7916 .- 1558-7924. ; , s. 577-588
  • Tidskriftsartikel (refereegranskat)abstract
    • Today's telecommunications systems use a limited audio signal bandwidth. A typical bandwidth is 0.3-3.4 kHz, but recently it has been suggested that mobile phone networks will facilitate an audio signal bandwidth of 50 Hz-7 kHz. This is suggested since an increased bandwidth will increase the sound quality of the speech signals. Since only few telephones initially will have this facility, a method extending the conventional narrow frequency-band speech signal into a wide-band speech signal utilizing the receiving telephone only is suggested. This will give the impression of a wide-band speech signal. The proposed speech bandwidth extension method is based on models of speech acoustics and fundamentals of human hearing. The extension maps each speech feature separately. Care has been taken to deal with implementation aspects, such as noisy speech signals, speech signal delays, computational complexity, and processing memory usage.
  •  
2.
  • Holzapfel, André, 1976-, et al. (författare)
  • Musical genre classification using Nonnegative Matrix Factorization based features
  • 2008
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : IEEE Press. - 1558-7916 .- 1558-7924. ; 16:2, s. 424-434
  • Tidskriftsartikel (refereegranskat)abstract
    • Nonnegative matrix factorization (NMF) is used to derive a novel description for the timbre of musical sounds. Using NMF, a spectrogram is factorized providing a characteristic spectral basis. Assuming a set of spectrograms given a musical genre, the space spanned by the vectors of the obtained spectral bases is modeled statistically using mixtures of Gaussians, resulting in a description of the spectral base for this musical genre. This description is shown to improve classification results by up to 23.3% compared to MFCC-based models, while the compression performed by the factorization decreases training time significantly. Using a distance-based stability measure this compression is shown to reduce the noise present in the data set resulting in more stable classification models. In addition, we compare the mean squared errors of the approximation to a spectrogram using independent component analysis and nonnegative matrix factorization, showing the superiority of the latter approach.
  •  
3.
  • Lee, C. H., et al. (författare)
  • Applying a speaker-dependent speech compression technique to concatenative TTS synthesizers
  • 2007
  • Ingår i: IEEE Transactions on Audio, Speech and Language Processing. - : Institute of Electrical and Electronics Engineers (IEEE). - 1558-7924 .- 1558-7916. ; 15:2, s. 632-640
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper proposes a new speaker-dependent coding algorithm to efficiently compress a large speech database for corpus-based concatenative text-to-speech (TTS) engines while maintaining high fidelity. To achieve a high compression ratio and meet the fundamental requirements of concatenative TTS synthesizers, such as partial segment decoding and random access capability, we adopt a nonpredictive analysis-by-synthesis scheme for speaker-dependent parameter estimation and quantization. The spectral coefficients are quantized by using a memoryless split vector quantization (VQ) approach that does not use frame correlation. Considering that excitation signals of a specific speaker show low intra-variation especially in the voiced regions, the conventional adaptive codebook for pitch prediction is replaced by a speaker-dependent pitch-pulse codebook trained by a corpus of single-speaker speech signals. To further improve the coding efficiency, the proposed coder flexibly combines nonpredictive and predictive type method considering the structure of the TTS system. By applying the proposed algorithm to a Korean TTS system, we could obtain comparable quality to the G.729 speech coder and satisfy all the requirements that TTS system needs. The results are verified by both objective and subjective quality measurements. In addition, the decoding complexity of the proposed coder is around 55% lower than that of G.729 annex A.
  •  
4.
  • Lindström, Fredric, et al. (författare)
  • An Improvement of the Two-Path Algorithm Transfer Logic for Acoustic Echo Cancellation
  • 2007
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : IEEE. - 1558-7916 .- 1558-7924. ; 15:4, s. 1320-1326
  • Tidskriftsartikel (refereegranskat)abstract
    • Adaptive filters for echo cancellation generally need update control schemes to avoid divergence in case of significant disturbances. The two-path algorithm avoids the problem of unnecessary halting of the adaptive filter when the control scheme gives an erroneous output. Versions of this algorithm have previously been presented for echo cancellation. This paper presents a transfer logic which improves the convergence speed of the two-path algorithm for acoustic echo cancellation, while retaining the robustness. Results from simulations show an improved performance, and a fixed-point DSP implementation verifies the performance in real-time
  •  
5.
  • Mancini, Maurizio, et al. (författare)
  • A virtual head driven by music expressivity
  • 2007
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : Institute of Electrical and Electronics Engineers (IEEE). - 1558-7916 .- 1558-7924. ; 15:6, s. 1833-1841
  • Tidskriftsartikel (refereegranskat)abstract
    • In this paper, we present a system that visualizes the expressive quality of a music performance using a virtual head. We provide a mapping through several parameter spaces: on the input side, we have elaborated a mapping between values of acoustic cues and emotion as well as expressivity parameters; on the output side, we propose a mapping between these parameters and the behaviors of the virtual head. This mapping ensures a coherency between the acoustic source and the animation of the virtual head. After presenting some background information on behavior expressivity of humans, we introduce our model of expressivity. We explain how we have elaborated the mapping between the acoustic and the behavior cues. Then, we describe the implementation of a working system that controls the behavior of a human-like head that varies depending on the emotional and acoustic characteristics of the musical execution. Finally, we present the tests we conducted to validate our mapping between the emotive content of the music performance and the expressivity parameters.
  •  
6.
  • Ozerov, Alexey, et al. (författare)
  • Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs
  • 2007
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - 1558-7916 .- 1558-7924. ; 15:5, s. 1564-1578
  • Tidskriftsartikel (refereegranskat)abstract
    • Probabilistic approaches can offer satisfactory solutions to source separation with a single channel, provided that the models of the sources match accurately the statistical properties of the mixed signals. However, it is not always possible to train such models. To overcome this problem, we propose to resort to an adaptation scheme for adjusting the source models with respect to the actual properties of the signals observed in the mix. In this paper; we introduce a general formalism for source model-adaptation which is expressed in the framework of Bayesian models. Particular cases of the proposed approach are then investigated experimentally on the problem of separating voice from music in popular songs. The obtained results show that an adaptation scheme can improve consistently and significantly the separation performance in comparison with nonadapted models.
  •  
7.
  • Sällberg, Benny, et al. (författare)
  • Complex-Valued Independent Component Analysis for Online Blind Speech Extraction
  • 2008
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : IEEE. - 1558-7916 .- 1558-7924. ; 16:8, s. 1624-1632
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper presents a theoretical analysis of a certain criterion for complex-valued independent component analysis (ICA) with a focus on blind speech extraction (BSE) of a spatio–temporally nonstationary speech source. In the paper, the proposed criteria denoted KSICA is related to the well-known FastICA method with the Kurtosis contrast function. The proposed method is shown to share the important fixed-point feature withthe FastICA method, although an improvement with the proposed method is that it does not exhibit the divergent behavior for a mixture of Gaussian-only sources that the FastICA method tends to do, and it shows better performance in online implementations. Compared to the FastICA, the KSICA method provides a 10 dB higher source extraction performance and a 10 dB lower standard deviation in a data batch approach when the data batch size is less than 100 samples. For larger batch sizes, the KSICA metod performs equally well. In an online application with spatially stationary sources the KSICA method provides around 10 dB higher interference suppression, and 1 MOS-unit lower speech distortion compared to the FastICA for 0.15 s time constant in the algorithm update parameter. Thus, the FastICA performance matches the KSICA performance for a time constant above 1 s. Finally, in an online application with a moving speech source, the KSICA method provides 10 dB higher interference suppression, compared to the FastICA for the same algorithm settings. All in all, the proposed KSICA method is shown to be a viable alternative for online BSE of complex-valued signal mixtures.
  •  
8.
  • Yermeche, Zohra, et al. (författare)
  • Bind Subband Beamforming with Time-delay Constraints for Moving Source Speech Enhancement
  • 2007
  • Ingår i: IEEE Transactions on Audio, Speech, and Language Processing. - : IEEE. - 1558-7916 .- 1558-7924. ; 15:8, s. 2360-2372
  • Tidskriftsartikel (refereegranskat)abstract
    • A new robust microphone array method to enhance speech signals generated by a moving person in a noisy environment is presented. This blind approach is based on a two-stage scheme. First, a subband time-delay estimation method is used to localize the dominant speech source. The second stage involves speech enhancement, based on the acquired spatial information, by means of a soft-constrained subband beamformer. The novelty of the proposed method involves considering the spatial spreading of the sound source as equivalent to a time-delay spreading, thus, allowing for the estimated intersensor time-delays to be directly used in the beamforming operations. In comparison to previous approaches, this new method requires no special array geometry, knowledge of the array manifold, or acquisition of calibration data to adapt the array weights. Furthermore, such a scheme allows for the beamformer to efficiently adapt to speaker movement. The robustness of the time-delay estimation of speech signals in high noise levels is improved by making use of the non-Gaussian nature of speech trough a subband Kurtosis-weighted structure. Evaluation in a real environment with a moving speaker shows promising results, with suppression levels of up to 16 dB for background noise and interfering (speech) signals, associated to a relatively small effect of speech distortion.
  •  
9.
  • Grancharov, Volodya, et al. (författare)
  • Generalized postfilter for speech quality enhancement
  • 2008
  • Ingår i: IEEE Transactions on Audio, Speech and Language Processing. - 1558-7916. ; 16:1, s. 57-64
  • Tidskriftsartikel (refereegranskat)abstract
    • Postfilters are commonly used in speech coding for the attenuation of quantization noise. In the presence of acoustic background noise or distortion due to tandeming operations, the postfilter parameters are not adjusted and the performance is, therefore, not optimal. We propose a modification that consists of replacing the nonadaptive postfilter parameters with parameters that adapt to variations in spectral flatness, obtained from the noisy speech. This generalization of the postfiltering concept can handle a larger range of noise conditions, but has the same computational complexity and memory requirements as the conventional postfilter. Test results indicate that the presented algorithm improves on the standard postfilter, as well as on the combination of a noise attenuation preprocessor and the conventional postfilter.
  •  
10.
  • Grancharov, Volodya, et al. (författare)
  • Low-complexity, non-intrusive speech quality assessment
  • 2006
  • Ingår i: IEEE Transactions on Speech and Audio Processing.. - 1558-7916. ; 14:6, s. 1948-1956
  • Tidskriftsartikel (refereegranskat)abstract
    • Monitoring of speech quality in emerging heterogeneous networks is of great interest to network operators. The most efficient way to satisfy such a need is through nonintrusive, objective speech quality assessment. In this paper, we describe a low-complexity algorithm for monitoring the speech quality over a network. The features used in the proposed algorithm can be computed from commonly used speech-coding parameters. Reconstruction and perceptual transformation of the signal is not performed. The critical advantage of the approach lies in generating quality assessment ratings without explicit distortion modeling. The results from the performed experiments indicate that the proposed nonintrusive objective quality measure performs better than the ITU-T P.563 standard.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 17

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy