SwePub - sökning: WFRF:(Székely Éva)

Numrering	Referens	Omslagsbild	Hitta
1.	Abou-Zleikha, Mohamed, et al. (författare) Multi-level exemplar-based duration generation for expressive speech synthesis 2012 Ingår i: Proceedings of Speech Prosody. Konferensbidrag (refereegranskat)abstract The generation of duration of speech units from linguistic in- formation, as one component of a prosody model, is consid- ered to be a requirement for natural sounding speech synthesis. This paper investigates the use of a multi-level exemplar-based model for duration generation for the purposes of expressive speech synthesis. The multi-level exemplar-based model has been proposed in the literature as a cognitive model for the pro- duction of duration. The implementation of this model for dura- tion generation for speech synthesis is not straightforward and requires a set of modifications to the model and that the linguis- tically related units and the context of the target units should be taken into consideration. The work presented in this paper implements this model and presents a solution to these issues through the use of prosodic-syntactic correlated data, full con- text information of the input example and corpus exemplars.
2.	Ahmed, Zeeshan, et al. (författare) A system for facial expression-based affective speech translation 2013 Ingår i: Proceedings of the companion publication of the 2013 international conference on Intelligent user interfaces companion. ; , s. 57-58 Konferensbidrag (refereegranskat)abstract In the emerging eld of speech-to-speech translation, empha- sis is currently placed on the linguistic content, while the sig- ni cance of paralinguistic information conveyed by facial ex- pression or tone of voice is typically neglected. We present a prototype system for multimodal speech-to-speech transla- tion that is able to automatically recognize and translate spo- ken utterances from one language into another, with the out- put rendered by a speech synthesis system. The novelty of our system lies in the technique of generating the synthetic speech output in one of several expressive styles that is au- tomatically determined using a camera to analyze the user’s facial expression during speech.
3.	Alexanderson, Simon, et al. (författare) Generating coherent spontaneous speech and gesture from text 2020 Ingår i: Proceedings of the 20th ACM International Conference on Intelligent Virtual Agents, IVA 2020. - New York, NY, USA : Association for Computing Machinery (ACM). Konferensbidrag (refereegranskat)abstract Embodied human communication encompasses both verbal (speech) and non-verbal information (e.g., gesture and head movements). Recent advances in machine learning have substantially improved the technologies for generating synthetic versions of both of these types of data: On the speech side, text-to-speech systems are now able to generate highly convincing, spontaneous-sounding speech using unscripted speech audio as the source material. On the motion side, probabilistic motion-generation methods can now synthesise vivid and lifelike speech-driven 3D gesticulation. In this paper, we put these two state-of-the-art technologies together in a coherent fashion for the first time. Concretely, we demonstrate a proof-of-concept system trained on a single-speaker audio and motion-capture dataset, that is able to generate both speech and full-body gestures together from text input. In contrast to previous approaches for joint speech-and-gesture generation, we generate full-body gestures from speech synthesis trained on recordings of spontaneous speech from the same person as the motion-capture data. We illustrate our results by visualising gesture spaces and textspeech-gesture alignments, and through a demonstration video.
4.	Aylett, Matthew Peter, et al. (författare) Why is my Agent so Slow? Deploying Human-Like Conversational Turn-Taking 2023 Ingår i: HAI 2023 - Proceedings of the 11th Conference on Human-Agent Interaction. - : Association for Computing Machinery (ACM). ; , s. 490-492 Konferensbidrag (refereegranskat)abstract The emphasis on one-to-one speak/wait spoken conversational interaction with intelligent agents leads to long pauses between conversational turns, undermines the flow and naturalness of the interaction, and undermines the user experience. Despite ground breaking advances in the area of generating and understanding natural language with techniques such as LLMs, conversational interaction has remained relatively overlooked. In this workshop we will discuss and review the challenges, recent work and potential impact of improving conversational interaction with artificial systems. We hope to share experiences of poor human/system interaction, best practices with third party tools, and generate design guidance for the community.
5.	Betz, Simon, et al. (författare) The greennn tree - lengthening position influences uncertainty perception 2019 Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2019. - : The International Speech Communication Association (ISCA). ; , s. 3990-3994 Konferensbidrag (refereegranskat)abstract Synthetic speech can be used to express uncertainty in dialogue systems by means of hesitation. If a phrase like “Next to the green tree” is uttered in a hesitant way, that is, containing lengthening, silences, and fillers, the listener can infer that the speaker is not certain about the concepts referred to. However, we do not know anything about the referential domain of the uncertainty; if only a particular word in this sentence would be uttered hesitantly, e.g. “the greee:n tree”, the listener could infer that the uncertainty refers to the color in the statement, but not to the object. In this study, we show that the domain of the uncertainty is controllable. We conducted an experiment in which color words in sentences like “search for the green tree” were lengthened in two different positions: word onsets or final consonants, and participants were asked to rate the uncertainty regarding color and object. The results show that initial lengthening is predominantly associated with uncertainty about the word itself, whereas final lengthening is primarily associated with the following object. These findings enable dialogue system developers to finely control the attitudinal display of uncertainty, adding nuances beyond the lexical content to message delivery.
6.	Cabral, Joao P, et al. (författare) Rapidly Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz. 2012 Ingår i: Proceedings of the International Conference on Language Resources and Evaluation. ; , s. 4136-4142 Konferensbidrag (refereegranskat)abstract This paper describes a prototype of a computer-assisted pronunciation training system called MySpeech. The interface of the MySpeech system is web-based and it currently enables users to practice pronunciation by listening to speech spoken by native speakers and tuning their speech production to correct any mispronunciations detected by the system. This practice exercise is facilitated in different topics and difficulty levels. An experiment was conducted in this work that combines the MySpeech service with the WebWOZ Wizard-of-Oz platform (http://www.webwoz.com), in order to improve the human-computer interaction (HCI) of the service and the feedback that it provides to the user. The employed Wizard-of-Oz method enables a human (who acts as a wizard) to give feedback to the practising user, while the user is not aware that there is another person involved in the communication. This experiment permitted to quickly test an HCI model before its implementation on the MySpeech system. It also allowed to collect input data from the wizard that can be used to improve the proposed model. Another outcome of the experiment was the preliminary evalua- tion of the pronunciation learning service in terms of user satisfaction, which would be difficult to conduct before integrating the HCI part.
7.	Cabral, Joao P, et al. (författare) Using the Wizard-of-Oz Framework in a Pronunciation Training System for Providing User Feedback and Instructions 2012 Konferensbidrag (refereegranskat)
8.	Cahill, Peter, et al. (författare) Ucd blizzard challenge 2011 entry 2011 Ingår i: Proceedings of the Blizzard Challenge Workshop. Konferensbidrag (refereegranskat)abstract This paper gives an overview of the UCD Blizzard Challenge 2011 entry. The entry is a unit selection synthesiser that uses hidden Markov models for prosodic modelling. The evaluation consisted of synthesising 2213 sentences from a high quality 15 hour dataset provided by Lessac Technologies. Results are analysed within the context of other systems and the future work for the system is discussed.
9.	Clark, Leigh, et al. (författare) Mapping Theoretical and Methodological Perspectives for Understanding Speech Interface Interactions 2019 Ingår i: CHI EA '19 EXTENDED ABSTRACTS. - New York, NY, USA : ASSOC COMPUTING MACHINERY. Konferensbidrag (refereegranskat)abstract The use of speech as an interaction modality has grown considerably through the integration of Intelligent Personal Assistants (IPAs- e.g. Siri, Google Assistant) into smartphones and voice based devices (e.g. Amazon Echo). However, there remain significant gaps in using theoretical frameworks to understand user behaviours and choices and how they may applied to specific speech interface interactions. This part-day multidisciplinary workshop aims to critically map out and evaluate theoretical frameworks and methodological approaches across a number of disciplines and establish directions for new paradigms in understanding speech interface user behaviour. In doing so, we will bring together participants from HCI and other speech related domains to establish a cohesive, diverse and collaborative community of researchers from academia and industry with interest in exploring theoretical and methodological issues in the field.
10.	Cowan, Benjamin R., et al. (författare) They Know as Much as We Do : Knowledge Estimation and Partner Modelling of Artificial Partners 2017 Ingår i: CogSci 2017 - Proceedings of the 39th Annual Meeting of the Cognitive Science Society: Computational Foundations of Cognition. - : The Cognitive Science Society. ; , s. 1836-1841 Konferensbidrag (refereegranskat)abstract Conversation partners' assumptions about each other's knowledge (their partner models) on a subject are important in spoken interaction. However, little is known about what influences our partner models in spoken interactions with artificial partners. In our experiment we asked people to name 15 British landmarks, and estimate their identifiability to a person as well as an automated conversational agent of either British or American origin. Our results show that people's assumptions about what an artificial partner knows are related to their estimates of what other people are likely to know - but they generally estimate artificial partners to have more knowledge in the task than human partners. These findings shed light on the way in which people build partner models of artificial partners. Importantly, they suggest that people use assumptions about what other humans know as a heuristic when assessing an artificial partner's knowledge.
11.	Ekstedt, Erik, et al. (författare) Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis 2023 Ingår i: Interspeech 2023. - : International Speech Communication Association. ; , s. 5481-5485 Konferensbidrag (refereegranskat)abstract Turn-taking is a fundamental aspect of human communication where speakers convey their intention to either hold, or yield, their turn through prosodic cues. Using the recently proposed Voice Activity Projection model, we propose an automatic evaluation approach to measure these aspects for conversational speech synthesis. We investigate the ability of three commercial, and two open-source, Text-To-Speech (TTS) systems ability to generate turn-taking cues over simulated turns. By varying the stimuli, or controlling the prosody, we analyze the models performances. We show that while commercial TTS largely provide appropriate cues, they often produce ambiguous signals, and that further improvements are possible. TTS, trained on read or spontaneous speech, produce strong turn-hold but weak turn-yield cues. We argue that this approach, that focus on functional aspects of interaction, provides a useful addition to other important speech metrics, such as intelligibility and naturalness.
12.	Elmers, Mikey, et al. (författare) Synthesis after a couple PINTs : Investigating the Role of Pause-Internal Phonetic Particles in Speech Synthesis and Perception 2023 Ingår i: Interspeech 2023. - : International Speech Communication Association. ; , s. 4843-4847 Konferensbidrag (refereegranskat)abstract Pause-internal phonetic particles (PINTs), such as breath noises, tongue clicks and hesitations, play an important role in speech perception but are rarely modeled in speech synthesis. We developed two text-to-speech (TTS) systems: one with and one without PINTs labels in the training data. Both models produced fewer PINTs and had a lower total PINTs duration than natural speech. The labeled model generated more PINTs and longer total PINTs durations than the model without labels. In a listening experiment based on the labeled model we evaluated the influence of various PINTs combinations on the perception of speaker certainty. We tested a condition without PINTs material and three conditions that included PINTs. The condition without PINTs was perceived as significantly more certain than the PINTs conditions, suggesting that we can modify how certain TTS is perceived by including PINTs.
13.	Gustafsson, Joakim, Professor, 1966-, et al. (författare) Generation of speech and facial animation with controllable articulatory effort for amusing conversational characters 2023 Ingår i: 23rd ACM International Conference on Interlligent Virtual Agent (IVA 2023). - : Institute of Electrical and Electronics Engineers (IEEE). Konferensbidrag (refereegranskat)abstract Engaging embodied conversational agents need to generate expressive behavior in order to be believable insocializing interactions. We present a system that can generate spontaneous speech with supporting lip movements. The neural conversational TTSvoice is trained on a multi-style speech corpus that has been prosodically tagged (pitch and speaking rate) and transcribed (including tokens for breathing, fillers and laughter). We introduce a speech animation algorithm where articulatory effort can be adjusted. The facial animation is driven by time-stamped phonemes and prominence estimates from the synthesised speech waveform to modulate the lip and jaw movements accordingly. In objective evaluations we show that the system is able to generate speech and facial animation that vary in articulation effort. In subjective evaluations we compare our conversational TTS system’s capability to deliver jokes with a commercial TTS. Both systems succeeded equally good.
14.	Kirkland, Ambika, et al. (författare) Evaluating the impact of disfluencies on the perception of speaker competence using neural speech synthesis 2023 Ingår i: Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS). ; , s. 550-554 Konferensbidrag (refereegranskat)
15.	Kirkland, Ambika, et al. (författare) Pardon my disfluency : The impact of disfluency effects on the perception of speaker competence and confidence 2023 Ingår i: Interspeech 2023. - : International Speech Communication Association. ; , s. 5217-5221 Konferensbidrag (refereegranskat)abstract Disfluencies are a hallmark of spontaneous speech and play an important role in conversation, yet have been shown to negatively impact judgments about speakers. We explored the role of disfluencies in the perception of competence, sincerity and confidence in public speaking contexts, using synthesized spontaneous speech. In one experiment, listeners rated 30-40-second clips which varied in terms of whether they contained filled pauses, as well as the number and types of repetition. Both the overall number of disfluencies and the repetition type had an impact on competence and confidence, and disfluent speech was also rated as less sincere. In the second experiment, the negative effects of repetition type on competence were attenuated when participants attributed disfluency to anxiety.
16.	Kirkland, Ambika, et al. (författare) Perception of smiling voice in spontaneous speech synthesis 2021 Ingår i: Proceedings of Speech Synthesis Workshop (SSW11). - : International Speech Communication Association. ; , s. 108-112 Konferensbidrag (refereegranskat)abstract Smiling during speech production has been shown to result in perceptible acoustic differences compared to non-smiling speech. However, there is a scarcity of research on the perception of “smiling voice” in synthesized spontaneous speech. In this study, we used a sequence-to-sequence neural text-tospeech system built on conversational data to produce utterances with the characteristics of spontaneous speech. Segments of speech following laughter, and the same utterances not preceded by laughter, were compared in a perceptual experiment after removing laughter and/or breaths from the beginning of the utterance to determine whether participants perceive the utterances preceded by laughter as sounding as if they were produced while smiling. The results showed that participants identified the post-laughter speech as smiling at a rate significantly greater than chance. Furthermore, the effect of content (positive/neutral/negative) was investigated. These results show that laughter, a spontaneous, non-elicited phenomenon in our model’s training data, can be used to synthesize expressive speech with the perceptual characteristics of smiling.
17.	Kirkland, Ambika, et al. (författare) Where's the uh, hesitation? : The interplay between filled pause location, speech rate and fundamental frequency in perception of confidence 2022 Ingår i: INTERSPEECH 2022. - : International Speech Communication Association. ; , s. 4990-4994 Konferensbidrag (refereegranskat)abstract Much of the research investigating the perception of speaker certainty has relied on either attempting to elicit prosodic features in read speech, or artificial manipulation of recorded audio. Our novel method of controlling prosody in synthesized spontaneous speech provides a powerful tool for studying speech perception and can provide better insight into the interacting effects of prosodic features on perception while also paving the way for conversational systems which are more effectively able to engage in and respond to social behaviors. Here we have used this method to examine the combined impact of filled pause location, speech rate and f0 on the perception of speaker confidence. We found an additive effect of all three features. The most confident-sounding utterances had no filler, low f0 and high speech rate, while the least confident-sounding utterances had a medial filled pause, high f0 and low speech rate. Insertion of filled pauses had the strongest influence, but pitch and speaking rate could be used to more finely control the uncertainty cues in spontaneous speech synthesis.
18.	Lameris, Harm, et al. (författare) Beyond style : synthesizing speech with pragmatic functions 2023 Ingår i: Interspeech 2023. - : International Speech Communication Association. ; , s. 3382-3386 Konferensbidrag (refereegranskat)abstract With recent advances in generative modelling, conversational systems are becoming more lifelike and capable of long, nuanced interactions. Text-to-Speech (TTS) is being tested in territories requiring natural-sounding speech that can mimic the complexities of human conversation. Hyper-realistic speech generation has been achieved, but a gap remains between the verbal behavior required for upscaled conversation, such as paralinguistic information and pragmatic functions, and comprehension of the acoustic prosodic correlates underlying these. Without this knowledge, reproducing these functions in speech has little value. We use prosodic correlates including spectral peaks, spectral tilt, and creak percentage for speech synthesis with the pragmatic functions of small talk, self-directed speech, advice, and instructions. We perform a MOS evaluation, and a suitability experiment in which our system outperforms a read-speech and conversational baseline.
19.	Lameris, Harm, et al. (författare) Neural speech synthesis with controllable creaky voice style 2023 Ingår i: Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS). ; , s. 3141-3145 Konferensbidrag (refereegranskat)
20.	Lameris, Harm, 1997-, et al. (författare) Prosody-Controllable Spontaneous TTS with Neural HMMs 2023 Ingår i: International Conference on Acoustics, Speech, and Signal Processing (ICASSP). - : Institute of Electrical and Electronics Engineers (IEEE). Konferensbidrag (refereegranskat)abstract Spontaneous speech has many affective and pragmatic functions that are interesting and challenging to model in TTS. However, the presence of reduced articulation, fillers, repetitions, and other disfluencies in spontaneous speech make the text and acoustics less aligned than in read speech, which is problematic for attention-based TTS. We propose a TTS architecture that can rapidly learn to speak from small and irregular datasets, while also reproducing the diversity of expressive phenomena present in spontaneous speech. Specifically, we add utterance-level prosody control to an existing neural HMM-based TTS system which is capable of stable, monotonic alignments for spontaneous speech. We objectively evaluate control accuracy and perform perceptual tests that demonstrate that prosody control does not degrade synthesis quality. To exemplify the power of combining prosody control and ecologically valid data for reproducing intricate spontaneous speech phenomena, we evaluate the system’s capability of synthesizing two types of creaky voice.
21.	Lameris, Harm, et al. (författare) Spontaneous Neural HMM TTS with Prosodic Feature Modification 2022 Ingår i: Proceedings of Fonetik 2022. Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract Spontaneous speech synthesis is a complex enterprise, as the data has large variation, as well as speech disfluencies nor-mally omitted from read speech. These disfluencies perturb the attention mechanism present in most Text to Speech (TTS) sys-tems. Explicit modelling of prosodic features has enabled intu-itive prosody modification of synthesized speech. Most pros-ody-controlled TTS, however, has been trained on read-speech data that is not representative of spontaneous conversational prosody. The diversity in prosody in spontaneous speech data allows for more wide-ranging data-driven modelling of pro-sodic features. Additionally, prosody-controlled TTS requires extensive training data and GPU time which limits accessibil-ity. We use neural HMM TTS as it reduces the parameter size and can achieve fast convergence with stable alignments for spontaneous speech data. We modify neural HMM TTS to ena-ble prosodic control of the speech rate and fundamental fre-quency. We perform subjective evaluation of the generated speech of English and Swedish TTS models and objective eval-uation for English TTS. Subjective evaluation showed a signif-icant improvement in naturalness for Swedish for the mean prosody compared to a baseline with no prosody modification, and the objective evaluation showed greater variety in the mean of the per-utterance prosodic features.
22.	Lameris, Harm, et al. (författare) The Role of Creaky Voice in Turn Taking and the Perception of Speaker Stance: Experiments Using Controllable TTS 2024 Ingår i: 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings. - : European Language Resources Association (ELRA). ; , s. 16058-16065 Konferensbidrag (refereegranskat)abstract Recent advancements in spontaneous text-to-speech (TTS) have enabled the realistic synthesis of creaky voice, a voice quality known for its diverse pragmatic and paralinguistic functions. In this study, we used synthesized creaky voice in perceptual tests, to explore how listeners without formal training perceive two distinct types of creaky voice. We annotated a spontaneous speech corpus using creaky voice detection tools and modified a neural TTS engine with a creaky phonation embedding to control the presence of creaky phonation in the synthesized speech. We performed an objective analysis using a creak detection tool which revealed significant differences in creaky phonation levels between the two creaky voice types and modal voice. Two subjective listening experiments were performed to investigate the effect of creaky voice on perceived certainty, valence, sarcasm, and turn finality. Participants rated non-positional creak as less certain, less positive, and more indicative of turn finality, while positional creak was rated significantly more turn final compared to modal phonation.
23.	Máthé, I., et al. (författare) Investigation of mineral water springs of Miercurea Ciuc (Csíkszereda) region (Romania) with cultivation-dependent microbiological methods 2010 Ingår i: Acta Microbiologica et Immunologica Hungarica. - 1217-8950 .- 1588-2640. ; 57:2, s. 109-122 Tidskriftsartikel (refereegranskat)abstract Water samples of ten mineral water springs at Miercurea Ciuc (Csíkszereda) region (Romania) were examined during 2005-2006 using cultivation-dependent microbiological methods. The results of standard hygienic bacteriological tests showed that the Hargita Spring had perfect and five other springs had microbiologically acceptable water quality (Zsögöd-, Nagy-borvíz-, Taploca-, Szentegyháza- and Lobogó springs). The water of Borsáros Spring was exceptionable (high germ count, presence of Enterococcus spp.).Both standard bacteriological and molecular microbiological methods indicated that the microbiological water quality of the Szeltersz-, Nádasszék- and Délo springs was not acceptable. Bad water quality resulted from inadequate spring catchment and hygiene (low yield, lack of runoff, negligent usage of the springs, horse manure around the spring).The 16S rRNA gene-based identification of strains isolated on standard meat-peptone medium resulted in the detection of typical aquatic organisms such as Shewanella baltica, Aeromonas spp., Pseudomonas veronii, Psychrobacter sp,. Acinetobacter spp. and allochthonous microbes, like Nocardia, Streptomyces, Bacillus, Microbacterium , and Arthrobacter strains indicating the impact of soil. Other allochthonous microbes, such as Staphylococcus spp., Micrococcus sp., Lactococcus sp., Clostridium butyricum, Yersinia spp., Aerococcus sp., may have originated from animal/human sources. Â© 2010 Akadémiai Kiadó, Budapest.
24.	Mehta, Shivam, et al. (författare) Neural HMMs are all you need (for high-quality attention-free TTS) 2022 Ingår i: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). - : IEEE Signal Processing Society. ; , s. 7457-7461 Konferensbidrag (refereegranskat)abstract Neural sequence-to-sequence TTS has achieved significantly better output quality than statistical speech synthesis using HMMs. However, neural TTS is generally not probabilistic and uses non-monotonic attention. Attention failures increase training time and can make synthesis babble incoherently. This paper describes how the old and new paradigms can be combined to obtain the advantages of both worlds, by replacing attention in neural TTS with an autoregressive left-right no-skip hidden Markov model defined by a neural network. Based on this proposal, we modify Tacotron 2 to obtain an HMM-based neural TTS model with monotonic alignment, trained to maximise the full sequence likelihood without approximation. We also describe how to combine ideas from classical and contemporary TTS for best results. The resulting example system is smaller and simpler than Tacotron 2, and learns to speak with fewer iterations and less data, whilst achieving comparable naturalness prior to the post-net. Our approach also allows easy control over speaking rate.
25.	Mehta, Shivam, et al. (författare) OverFlow : Putting flows on top of neural transducers for better TTS 2023 Ingår i: Interspeech 2023. - : International Speech Communication Association. ; , s. 4279-4283 Konferensbidrag (refereegranskat)abstract Neural HMMs are a type of neural transducer recently proposed for sequence-to-sequence modelling in text-to-speech. They combine the best features of classic statistical speech synthesis and modern neural TTS, requiring less data and fewer training updates, and are less prone to gibberish output caused by neural attention failures. In this paper, we combine neural HMM TTS with normalising flows for describing the highly non-Gaussian distribution of speech acoustics. The result is a powerful, fully probabilistic model of durations and acoustics that can be trained using exact maximum likelihood. Experiments show that a system based on our proposal needs fewer updates than comparable methods to produce accurate pronunciations and a subjective speech quality close to natural speech.
26.	Miniotaitė, Jūra, et al. (författare) Hi robot, it's not what you say, it's how you say it 2023 Ingår i: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN. - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 307-314 Konferensbidrag (refereegranskat)abstract Many robots use their voice to communicate with people in spoken language but the voices commonly used for robots are often optimized for transactional interactions, rather than social ones. This can limit their ability to create engaging and natural interactions. To address this issue, we designed a spontaneous text-to-speech tool and used it to author natural and spontaneous robot speech. A crowdsourcing evaluation methodology is proposed to compare this type of speech to natural speech and state-of-the-art text-to-speech technology, both in disembodied and embodied form. We created speech samples in a naturalistic setting of people playing tabletop games and conducted a user study evaluating Naturalness, Intelligibility, Social Impression, Prosody, and Perceived Intelligence. The speech samples were chosen to represent three contexts that are common in tabletopgames and the contexts were introduced to the participants that evaluated the speech samples. The study results show that the proposed evaluation methodology allowed for a robust analysis that successfully compared the different conditions. Moreover, the spontaneous voice met our target design goal of being perceived as more natural than a leading commercial text-to-speech.
27.	Oertel, Catharine, et al. (författare) Using crowd-sourcing for the design of listening agents : Challenges and opportunities 2017 Ingår i: ISIAA 2017 - Proceedings of the 1st ACM SIGCHI International Workshop on Investigating Social Interactions with Artificial Agents, Co-located with ICMI 2017. - New York, NY, USA : Association for Computing Machinery (ACM). - 9781450355582 ; , s. 37-38 Konferensbidrag (refereegranskat)abstract In this paper we are describing how audio-visual corpora recordings using crowd-sourcing techniques can be used for the audio-visual synthesis of attitudinal non-verbal feedback expressions for virtual agents. We are discussing the limitations of this approach as well as where we see the opportunities for this technology.
28.	Scharnweber, Kristin, 1983-, et al. (författare) Comprehensive analysis of chemical and biological problems associated with browning agents used in aquatic studies 2021 Ingår i: Limnology and Oceanography. - : Wiley. - 1541-5856. ; 19:12, s. 818-835 Tidskriftsartikel (refereegranskat)abstract Inland waters receive and process large amounts of colored organic matter from the terrestrial surroundings. These inputs dramatically affect the chemical, physical, and biological properties of water bodies, as well as their roles as global carbon sinks and sources. However, manipulative studies, especially at ecosystem scale, require large amounts of dissolved organic matter with optical and chemical properties resembling indigenous organic matter. Here, we compared the impacts of two leonardite products (HuminFeed and SuperHume) and a freshly derived reverse osmosis concentrate of organic matter in a set of comprehensive mesocosm- and laboratory-scale experiments and analyses. The chemical properties of the reverse osmosis concentrate and the leonardite products were very different, with leonardite products being low and the reverse osmosis concentrate being high in carboxylic functional groups. Light had a strong impact on the properties of leonardite products, including loss of color and increased particle formation. HuminFeed presented a substantial impact on microbial communities under light conditions, where bacterial production was stimulated and community composition modified, while in dark potential inhibition of bacterial processes was detected. While none of the browning agents inhibited the growth of the tested phytoplankton Gonyostomum semen, HuminFeed had detrimental effects on zooplankton abundance and Daphnia reproduction. We conclude that the effects of browning agents extracted from leonardite, particularly HuminFeed, are in sharp contrast to those originating from terrestrially derived dissolved organic matter. Hence, they should be used with great caution in experimental studies on the consequences of terrestrial carbon for aquatic systems.
29.	Székely, Éva, et al. (författare) Augmented Prompt Selection for Evaluation of Spontaneous Speech Synthesis 2020 Ingår i: <em>Proceedings of The 12th Language Resources and Evaluation Conference</em>. - : European Language Resources Association. ; , s. 6368-6374 Konferensbidrag (refereegranskat)abstract By definition, spontaneous speech is unscripted and created on the fly by the speaker. It is dramatically different from read speech, where the words are authored as text before they are spoken. Spontaneous speech is emergent and transient, whereas text read out loud is pre-planned. For this reason, it is unsuitable to evaluate the usability and appropriateness of spontaneous speech synthesis by having it read out written texts sampled from for example newspapers or books. Instead, we need to use transcriptions of speech as the target - something that is much less readily available. In this paper, we introduce Starmap, a tool allowing developers to select a varied, representative set of utterances from a spoken genre, to be used for evaluation of TTS for a given domain. The selection can be done from any speech recording, without the need for transcription. The tool uses interactive visualisation of prosodic features with t-SNE, along with a tree-based algorithm to guide the user through thousands of utterances and ensure coverage of a variety of prompts. A listening test has shown that with a selection of genre-specific utterances, it is possible to show significant differences across genres between two synthetic voices built from spontaneous speech.
30.	Székely, Éva, et al. (författare) Breathing and Speech Planning in Spontaneous Speech Synthesis 2020 Ingår i: 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). - : IEEE. ; , s. 7649-7653 Konferensbidrag (refereegranskat)abstract Breathing and speech planning in spontaneous speech are coordinated processes, often exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs, integrating breath into synthesis has advantages for naturalness and recall. At the same time, a synthetic voice reproducing disfluent breathing patterns learned from the data can be problematic. To address this, we first propose training stochastic TTS on a corpus of overlapping breath-group bigrams, to take context into account. Next, we introduce an unsupervised automatic annotation of likely-disfluent breath events, through a product-of-experts model that combines the output of two breath-event predictors, each using complementary information and operating in opposite directions. This annotation enables creating an automatically-breathing spontaneous speech synthesiser with a more fluent breathing style. A subjective evaluation on two spoken genres (impromptu and rehearsed) found the proposed system to be preferred over the baseline approach treating all breath events the same.
31.	Székely, Éva, et al. (författare) Casting to Corpus : Segmenting and Selecting Spontaneous Dialogue for TTS with a CNN-LSTM Speaker-Dependent Breath Detector 2019 Ingår i: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). - : IEEE. - 9781479981311 ; , s. 6925-6929 Konferensbidrag (refereegranskat)abstract This paper considers utilising breaths to create improved spontaneous-speech corpora for conversational text-to-speech from found audio recordings such as dialogue podcasts. Breaths are of interest since they relate to prosody and speech planning and are independent of language and transcription. Specifically, we propose a semisupervised approach where a fraction of coarsely annotated data is used to train a convolutional and recurrent speaker-specific breath detector operating on spectrograms and zero-crossing rate. The classifier output is used to find target-speaker breath groups (audio segments delineated by breaths) and subsequently select those that constitute clean utterances appropriate for a synthesis corpus. An application to 11 hours of raw podcast audio extracts 1969 utterances (106 minutes), 87% of which are clean and correctly segmented. This outperforms a baseline that performs integrated VAD and speaker attribution without accounting for breaths.
32.	Székely, Éva, et al. (författare) Clustering Expressive Speech Styles in Audiobooks Using Glottal Source Parameters. 2011 Ingår i: 12th Annual Conference of the International-Speech-Communication-Association 2011 (INTERSPEECH 2011). - : ISCA-INT SPEECH COMMUNICATION ASSOC. - 9781618392701 ; , s. 2409-2412 Konferensbidrag (refereegranskat)abstract A great challenge for text-to-speech synthesis is to produce ex- pressive speech. The main problem is that it is difficult to syn- thesise high-quality speech using expressive corpora. With the increasing interest in audiobook corpora for speech synthesis, there is a demand to synthesise speech which is rich in prosody, emotions and voice styles. In this work, Self-Organising Fea- ture Maps (SOFM) are used for clustering the speech data using voice quality parameters of the glottal source, in order to map out the variety of voice styles in the corpus. Subjective evalu- ation showed that this clustering method successfully separated the speech data into groups of utterances associated with dif- ferent voice characteristics. This work can be applied in unit- selection synthesis by selecting appropriate data sets to synthe- sise utterances with specific voice styles. It can also be used in parametric speech synthesis to model different voice styles separately.
33.	Székely, Éva, et al. (författare) Detecting a targeted voice style in an audiobook using voice quality features 2012 Ingår i: Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. - : Institute of Electrical and Electronics Engineers (IEEE). ; , s. 4593-4596 Konferensbidrag (refereegranskat)abstract Audiobooks are known to contain a variety of expressive speaking styles that occur as a result of the narrator mimicking a character in a story, or expressing affect. An accurate modeling of this variety is essential for the purposes of speech synthesis from an audiobook. Voice quality differences are important features characterizing these different speaking styles, which are realized on a gradient and are often difficult to predict from the text. The present study uses a pa- rameter characterizing breathy to tense voice qualities using features of the wavelet transform, and a measure for identifying creaky seg- ments in an utterance. Based on these features, a combination of supervised and unsupervised classification is used to detect the re- gions in an audiobook, where the speaker changes his regular voice quality to a particular voice style. The target voice style candidates are selected based on the agreement of the supervised classifier en- semble output, and evaluated in a listening test.
34.	Székely, Éva, et al. (författare) Evaluating expressive speech synthesis from audiobooks in conversational phrases 2012 Konferensbidrag (refereegranskat)abstract Audiobooks are a rich resource of large quantities of natural sounding, highly expressive speech. In our previous research we have shown that it is possible to detect different expressive voice styles represented in a particular audiobook, using unsupervised clustering to group the speech corpus of the audiobook into smaller subsets representing the detected voice styles. These subsets of corpora of different voice styles reflect the various ways a speaker uses their voice to express involvement and affect, or imitate characters. This study is an evaluation of the detection of voice styles in an audiobook in the application of expressive speech synthesis. A further aim of this study is to investigate the usability of audiobooks as a language resource for expressive speech synthesis of utterances of conversational speech. Two evaluations have been carried out to assess the effect of the genre transfer: transmitting expressive speech from read aloud literature to conversational phrases with the application of speech synthesis. The first evaluation revealed that listeners have different voice style preferences for a particular conversational phrase. The second evaluation showed that it is possible for users of speech synthesis systems to learn the characteristics of a certain voice style well enough to make reliable predictions about what a certain utterance will sound like when synthesised using that voice style.
35.	Székely, Éva, et al. (författare) Facial expression as an input annotation modality for affective speech-to-speech translation 2012 Konferensbidrag (refereegranskat)abstract One of the challenges of speech-to-speech translation is to accurately preserve the paralinguistic information in the speaker’s message. In this work we explore the use of automatic facial expression analysis as an input annotation modality to transfer paralinguistic information at a symbolic level from input to output in speech-to-speech translation. To evaluate the feasibility of this ap- proach, a prototype system, FEAST (Facial Expression-based Affective Speech Translation) has been developed. FEAST classifies the emotional state of the user and uses it to render the translated output in an appropriate voice style, using expressive speech synthesis.
36.	Székely, Éva, et al. (författare) Facial expression-based affective speech translation 2014 Ingår i: Journal on Multimodal User Interfaces. - : Springer Science and Business Media LLC. - 1783-7677 .- 1783-8738. ; 8:1, s. 87-96 Tidskriftsartikel (refereegranskat)abstract One of the challenges of speech-to-speech trans- lation is to accurately preserve the paralinguistic informa- tion in the speaker’s message. Information about affect and emotional intent of a speaker are often carried in more than one modality. For this reason, the possibility of multimodal interaction with the system and the conversation partner may greatly increase the likelihood of a successful and gratifying communication process. In this work we explore the use of automatic facial expression analysis as an input annotation modality to transfer paralinguistic information at a symbolic level from input to output in speech-to-speech translation. To evaluate the feasibility of this approach, a prototype sys- tem, FEAST (facial expression-based affective speech trans- lation) has been developed. FEAST classifies the emotional state of the user and uses it to render the translated output in an appropriate voice style, using expressive speech synthesis.
37.	Székely, Éva, et al. (författare) How to train your fillers: uh and um in spontaneous speech synthesis 2019 Konferensbidrag (refereegranskat)
38.	Székely, Éva, et al. (författare) Off the cuff : Exploring extemporaneous speech delivery with TTS 2019 Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. - : International Speech Communication Association. ; , s. 3687-3688 Konferensbidrag (refereegranskat)abstract Extemporaneous speech is a delivery type in public speaking which uses a structured outline but is otherwise delivered conversationally, off the cuff. This demo uses a natural-sounding spontaneous conversational speech synthesiser to simulate this delivery style. We resynthesised the beginnings of two Interspeech keynote speeches with TTS that produces multiple different versions of each utterance that vary in fluency and filled-pause placement. The platform allows the user to mark the samples according to any perceptual aspect of interest, such as certainty, authenticity, confidence, etc. During the speech delivery, they can decide on the fly which realisation to play, addressing their audience in a connected, conversational fashion. Our aim is to use this platform to explore speech synthesis evaluation options from a production perspective and in situational contexts.
39.	Székely, Éva, et al. (författare) Predicting synthetic voice style from facial expressions. An application for augmented conversations 2014 Ingår i: Speech Communication. - : Elsevier BV. - 0167-6393 .- 1872-7182. ; 57, s. 63-75 Tidskriftsartikel (refereegranskat)abstract The ability to efficiently facilitate social interaction and emotional expression is an important, yet unmet requirement for speech generating devices aimed at individuals with speech impairment. Using gestures such as facial expressions to control aspects of expressive synthetic speech could contribute to an improved communication experience for both the user of the device and the conversation partner. For this purpose, a mapping model between facial expressions and speech is needed, that is high level (utterance-based), versatile and personalisable. In the mapping developed in this work, visual and auditory modalities are connected based on the intended emotional salience of a message: the intensity of facial expressions of the user to the emotional intensity of the synthetic speech. The mapping model has been implemented in a system called WinkTalk that uses estimated facial expression categories and their intensity values to automat- ically select between three expressive synthetic voices reflecting three degrees of emotional intensity. An evaluation is conducted through an interactive experiment using simulated augmented conversations. The results have shown that automatic control of synthetic speech through facial expressions is fast, non-intrusive, sufficiently accurate and supports the user to feel more involved in the conversation. It can be concluded that the system has the potential to facilitate a more efficient communication process between user and listener.
40.	Székely, Éva, et al. (författare) Prosody-controllable gender-ambiguous speech synthesis : a tool for investigating implicit bias in speech perception 2023 Ingår i: Interspeech 2023. - : International Speech Communication Association. ; , s. 1234-1238 Konferensbidrag (refereegranskat)abstract This paper proposes a novel method to develop gender-ambiguous TTS, which can be used to investigate hidden gender bias in speech perception. Our aim is to provide a tool for researchers to conduct experiments on language use associated with specific genders. Ambiguous voices can also be beneficial for virtual assistants, to help reduce stereotypes and increase acceptance. Our approach uses a multi-speaker embedding in a neural TTS engine, combining two corpora recorded by a male and a female speaker to achieve a gender-ambiguous timbre. We also propose speaker-disentangled prosody control to ensure that the timbre is robust across a range of prosodies and enable more expressive speech. We optimised the output using an SSL-based network trained on hundreds of speakers. We conducted perceptual evaluations on the settings that were judged most ambiguous by the network, which showed that listeners perceived the speech samples as gender-ambiguous, also in prosody-controlled conditions.
41.	Székely, Éva, et al. (författare) So-to-Speak : an exploratory platform for investigating the interplay between style and prosody in TTS 2023 Ingår i: Interspeech 2023. - : International Speech Communication Association. ; , s. 2016-2017 Konferensbidrag (refereegranskat)abstract In recent years, numerous speech synthesis systems have been proposed that feature multi-dimensional controllability, generating a level of variability that surpasses traditional TTS systems by orders of magnitude. However, it remains challenging for developers to comprehend and demonstrate the potential of these advanced systems. We introduce So-to-Speak, a customisable interface tailored for showcasing the capabilities of different controllable TTS systems. The interface allows for the generation, synthesis, and playback of hundreds of samples simultaneously, displayed on an interactive grid, with variation both low level prosodic features and high level style controls. To offer insights into speech quality, automatic estimates of MOS scores are presented for each sample. So-to-Speak facilitates the audiovisual exploration of the interaction between various speech features, which can be useful in a range of applications in speech technology.
42.	Székely, Éva, et al. (författare) Spontaneous conversational speech synthesis from found data 2019 Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. - : ISCA. ; , s. 4435-4439 Konferensbidrag (refereegranskat)abstract Synthesising spontaneous speech is a difficult task due to disfluencies, high variability and syntactic conventions different from those of written language. Using found data, as opposed to lab-recorded conversations, for speech synthesis adds to these challenges because of overlapping speech and the lack of control over recording conditions. In this paper we address these challenges by using a speaker-dependent CNN-LSTM breath detector to separate continuous recordings into utterances, which we here apply to extract nine hours of clean single-speaker breath groups from a conversational podcast. The resulting corpus is transcribed automatically (both lexical items and filler tokens) and used to build several voices on a Tacotron 2 architecture. Listening tests show: i) pronunciation accuracy improved with phonetic input and transfer learning; ii) it is possible to create a more fluent conversational voice by training on data without filled pauses; and iii) the presence of filled pauses improved perceived speaker authenticity. Another listening test showed the found podcast voice to be more appropriate for prompts from both public speeches and casual conversations, compared to synthesis from found read speech and from a manually transcribed lab-recorded spontaneous conversation.
43.	Szekely, Eva, et al. (författare) Synthesising uncertainty : The interplay of vocal effort and hesitation disfluencies 2017 Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. - : International Speech Communication Association. ; , s. 804-808 Konferensbidrag (refereegranskat)abstract As synthetic voices become more flexible, and conversational systems gain more potential to adapt to the environmental and social situation, the question needs to be examined, how different modifications to the synthetic speech interact with each other and how their specific combinations influence perception. This work investigates how the vocal effort of the synthetic speech together with added disfluencies affect listeners' perception of the degree of uncertainty in an utterance. We introduce a DNN voice built entirely from spontaneous conversational speech data and capable of producing a continuum of vocal efforts, prolongations and filled pauses with a corpus-based method. Results of a listener evaluation indicate that decreased vocal effort, filled pauses and prolongation of function words increase the degree of perceived uncertainty of conversational utterances expressing the speaker's beliefs. We demonstrate that the effect of these three cues are not merely additive, but that interaction effects, in particular between the two types of disfluencies and between vocal effort and prolongations need to be considered when aiming to communicate a specific level of uncertainty. The implications of these findings are relevant for adaptive and incremental conversational systems using expressive speech synthesis and aspiring to communicate the attitude of uncertainty.
44.	Székely, Éva, et al. (författare) Synthesizing expressive speech from amateur audiobook recordings 2012 Ingår i: Spoken Language Technology Workshop (SLT). ; , s. 297-302 Konferensbidrag (refereegranskat)abstract Freely available audiobooks are a rich resource of expressive speech recordings that can be used for the purposes of speech synthesis. Natural sounding, expressive synthetic voices have previously been built from audiobooks that contained large amounts of highly expressive speech recorded from a profes- sionally trained speaker. The majority of freely available au- diobooks, however, are read by amateur speakers, are shorter and contain less expressive (less emphatic, less emotional, etc.) speech both in terms of quality and quantity. Synthesiz- ing expressive speech from a typical online audiobook there- fore poses many challenges. In this work we address these challenges by applying a method consisting of minimally su- pervised techniques to align the text with the recorded speech, select groups of expressive speech segments and build expres- sive voices for hidden Markov-model based synthesis using speaker adaptation. Subjective listening tests have shown that the expressive synthetic speech generated with this method is often able to produce utterances suited to an emotional mes- sage. We used a restricted amount of speech data in our exper- iment, in order to show that the method is generally applicable to most typical audiobooks widely available online.
45.	Székely, Éva, et al. (författare) The Effect of Soft, Modal and Loud Voice Levels on Entrainment in Noisy Conditions 2015 Ingår i: Sixteenth Annual Conference of the International Speech Communication Association. Konferensbidrag (refereegranskat)abstract Conversation partners have a tendency to adapt their vocal in- tensity to each other and to other social and environmental fac- tors. A socially adequate vocal intensity level by a speech syn- thesiser that goes beyond mere volume adjustment is highly de- sirable for a rewarding and successful human-machine or ma- chine mediated human-human interaction. This paper examines the interaction of the Lombard effect and speaker entrainment in a controlled experiment conducted with a confederate inter- locutor. The interlocutor was asked to maintain either a soft, a modal or a loud voice level during the dialogues. Through half of the trials, subjects were exposed to a cocktail party noise through headphones. The analytical results suggest that both the background noise and the interlocutor’s voice level affect the dynamics of speaker entrainment. Speakers appear to still en- train to the voice level of their interlocutor in noisy conditions, though to a lesser extent, as strategies of ensuring intelligibility affect voice levels as well. These findings could be leveraged in spoken dialogue systems and speech generating devices to help choose a vocal effort level for the synthetic voice that is both intelligible and socially suited to a specific interaction.
46.	Székely, Éva, et al. (författare) THE WRYLIE-BOARD: MAPPING ACOUSTIC SPACE OF EXPRESSIVE FEEDBACK TO ATTITUDE MARKERS 2018 Ingår i: Proc. IEEE Spoken Language Technology conference. Konferensbidrag (refereegranskat)
47.	Székely, Éva, et al. (författare) WinkTalk : a demonstration of a multimodal speech synthesis platform linking facial expressions to expressive synthetic voices 2012 Ingår i: Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies. - : Association for Computational Linguistics. ; , s. 5-8 Konferensbidrag (refereegranskat)abstract This paper describes a demonstration of the WinkTalk system, which is a speech synthe- sis platform using expressive synthetic voices. With the help of a webcamera and facial ex- pression analysis, the system allows the user to control the expressive features of the syn- thetic speech for a particular utterance with their facial expressions. Based on a person- alised mapping between three expressive syn- thetic voices and the users facial expressions, the system selects a voice that matches their face at the moment of sending a message. The WinkTalk system is an early research pro- totype that aims to demonstrate that facial expressions can be used as a more intuitive control over expressive speech synthesis than manual selection of voice types, thereby con- tributing to an improved communication expe- rience for users of speech generating devices.
48.	Székely, Éva, et al. (författare) WinkTalk: a multimodal speech synthesis interface linking facial expressions to expressive synthetic voices 2012 Ingår i: Proceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies. Konferensbidrag (refereegranskat)abstract This paper describes a demonstration of the WinkTalk system, which is a speech synthe- sis platform using expressive synthetic voices. With the help of a webcamera and facial ex- pression analysis, the system allows the user to control the expressive features of the syn- thetic speech for a particular utterance with their facial expressions. Based on a person- alised mapping between three expressive syn- thetic voices and the users facial expressions, the system selects a voice that matches their face at the moment of sending a message. The WinkTalk system is an early research pro- totype that aims to demonstrate that facial expressions can be used as a more intuitive control over expressive speech synthesis than manual selection of voice types, thereby con- tributing to an improved communication expe- rience for users of speech generating devices.
49.	Torre, Ilaria, et al. (författare) Can a gender-ambiguous voice reduce gender stereotypes in human-robot interactions? 2023 Ingår i: Proceedings of the 2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). - : IEEE. - 1944-9445. - 9798350336702 - 9798350336719 ; , s. 106-112 Konferensbidrag (refereegranskat)abstract When deploying robots, its physical characteristics, role, and tasks are often fixed. Such factors can also be associated with gender stereotypes among humans, which then transfer to the robots. One factor that can induce gendering but is comparatively easy to change is the robot’s voice. Designing voice in a way that interferes with fixed factors might therefore be a way to reduce gender stereotypes in human-robot interaction contexts. To this end, we have conducted a video-based online study to investigate how factors that might inspire gendering of a robot interact. In particular, we investigated how giving the robot a gender-ambiguous voice can affect perception of the robot. We compared assessments (n=111) of videos in which a robot’s body presentation and occupation mis/matched with human gender stereotypes. We found evidence that a gender-ambiguous voice can reduce gendering of a robot endowed with stereotypically feminine or masculine attributes. The results can inform more just robot design while opening new questions regarding the phenomenon of robot gendering.
50.	Vasquez, Alejandra, et al. (författare) Symbionts as major modulators of insect health: lactic Acid bacteria and honeybees. 2012 Ingår i: PLoS ONE. - : Public Library of Science (PLoS). - 1932-6203. ; 7:3 Tidskriftsartikel (refereegranskat)abstract Lactic acid bacteria (LAB) are well recognized beneficial host-associated members of the microbiota of humans and animals. Yet LAB-associations of invertebrates have been poorly characterized and their functions remain obscure. Here we show that honeybees possess an abundant, diverse and ancient LAB microbiota in their honey crop with beneficial effects for bee health, defending them against microbial threats. Our studies of LAB in all extant honeybee species plus related apid bees reveal one of the largest collections of novel species from the genera Lactobacillus and Bifidobacterium ever discovered within a single insect and suggest a long (>80 mya) history of association. Bee associated microbiotas highlight Lactobacillus kunkeei as the dominant LAB member. Those showing potent antimicrobial properties are acquired by callow honey bee workers from nestmates and maintained within the crop in biofilms, though beekeeping management practices can negatively impact this microbiota. Prophylactic practices that enhance LAB, or supplementary feeding of LAB, may serve in integrated approaches to sustainable pollinator service provision. We anticipate this microbiota will become central to studies on honeybee health, including colony collapse disorder, and act as an exemplar case of insect-microbe symbiosis.

Skapa referenser, mejla, bekava och länka

Länka till träfflistan

Träfflista för sökning "WFRF:(Székely Éva) "

Avgränsa träffmängd

År