SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Demeke Yonas) "

Sökning: WFRF:(Demeke Yonas)

  • Resultat 1-11 av 11
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Alem, Yonas, 1974, et al. (författare)
  • The persistence of energy poverty: A dynamic probit analysis
  • 2020
  • Ingår i: Energy Economics. - : Elsevier BV. - 0140-9883. ; 90:August 2020
  • Tidskriftsartikel (refereegranskat)abstract
    • This paper contributes to the growing literature on energy poverty in developing countries. We use a dynamic probit estimator on three rounds of panel data from urban Ethiopia to estimate a model of the probability of being energy poor and to investigate the persistence of energy poverty. We also study the impact of energy price inflation, which Ethiopia experienced during 2007-2009, on energy use and energy poverty. We find strong evidence of state dependence in energy poverty. A household that is energy poor in one round is up to 16% more likely to be energy poor in the subsequent round. Dynamic probit regression results also suggest that an increase in the price of kerosene - the most important fuel for the urban poor - drives households into energy poverty. A fractional response estimator for panel data, which estimates the impact of energy prices on the proportion of energy obtained from clean sources, also supports the finding on the adverse impact of energy price inflation. Households responded to the significant rise in the price of kerosene by consuming a large amount of charcoal, which has been documented to have serious environmental, climate, and health consequences. Our results have significant implications for policies formulated to reduce energy poverty, conserve biomass resources, and promote energy transition in developing countries. (c) 2020 Elsevier B.V. All rights reserved.
  •  
2.
  • Björklund, Henrik, et al. (författare)
  • Implementing a speech-to-text pipeline on the MICO platform
  • 2016
  • Rapport (övrigt vetenskapligt/konstnärligt)abstract
    • MICO is an open-source platform for cross-media analysis, querying, and recommendation. It is the major outcome of the European research project Media in Context, and has been contributed to by academic and industrial partners from Germany, Austria, Sweden, Italy, and the UK. A central idea is to group sets of related media objects into multimodal content items, and to process and store these as logical units. The platform is designed to be easy to extend and adapt, and this makes it a useful building block for a diverse set of multimedia applications. To promote the platform and demonstrate its potential, we describe our work on a Kaldi-based speech-recognition pipeline.
  •  
3.
  • Woldemariam, Yonas Demeke, et al. (författare)
  • A cloud-hosted MapReduce Architecture for syntactic parsing
  • 2019
  • Ingår i: 2019 45th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). - Greece : IEEE. - 9781728134215 - 9781728132853 ; , s. 100-107
  • Konferensbidrag (refereegranskat)abstract
    • Syntactic parsing is a time-consuming task in natural language processing particularly where a large number of text files are being processed. Parsing algorithms are conventionally designed to operate on a single machine in a sequential fashion and, as a consequence, fail to benefit from high performance and parallel computing resources available on the cloud. We designed and implemented a scalable cloud-based architecture supporting parallel and distributed syntactic parsing for large datasets. The main architecture consists of a syntactic parser (constituency and dependency parsing) and a MapReduce framework running on clusters of machines. The resulting cloud-based MapReduce parsing is able to build a map where syntactic trees of the same input file have the same key and collect into a single file containing sentences along with their corresponding trees. Our experimental evaluation shows that the architecture scales well with regard to number or processing nodes and number of cores per node. In the fastest tested cloud-based setup, the proposed design performs 7 times faster when compared to a local setup. In summary, this study takes an important step toward providing and evaluating a cloud-hosted solution for efficient syntactic parsing of natural language data sets consisting of a large number of files.
  •  
4.
  • Woldemariam, Yonas Demeke, et al. (författare)
  • Adapting language specific components of cross-media analysis frameworks to less-resourced languages : the case of Amharic
  • 2020
  • Ingår i: Proceedings of the 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020). - 9791095546351 ; , s. 298-305
  • Konferensbidrag (refereegranskat)abstract
    • We present an ASR based pipeline for Amharic that orchestrates NLP components within a cross media analysis framework (CMAF). One of the major challenges that are inherently associated with CMAFs is effectively addressing multi-lingual issues. As a result, many languages remain under-resourced and fail to leverage out of available media analysis solutions. Although spoken natively by over 22 million people and there is an ever-increasing amount of Amharic multimedia content on the Web, querying them with simple text search is difficult. Searching for, especially audio/video content with simple key words, is even hard as they exist in their raw form. In this study, we introduce a spoken and textual content processing workflow into a CMAF for Amharic. We design an ASR-named entity recognition (NER) pipeline that includes three main components: ASR, a transliterator and NER. We explore various acoustic modeling techniques and develop an OpenNLP-based NER extractor along with a transliterator that interfaces between ASR and NER. The designed ASR-NER pipeline for Amharic promotes the multi-lingual support of CMAFs. Also, the state-of-the art design principles and techniques employed in this study shed light for other less-resourced languages, particularly the Semitic ones.
  •  
5.
  • Woldemariam, Yonas Demeke (författare)
  • An algorithm for estimating answerers’ performance and improving answer quality predictions in QA forums
  • 2022
  • Ingår i: Proceedings of the 14th International Conference on Agents and Artificial Intelligence. - : SciTePress. - 9789897585470 ; , s. 106-113
  • Konferensbidrag (refereegranskat)abstract
    • In this study, a multi-components algorithm is developed for estimating answerer performance, largely from a syntactic representation of answer content. The resulting algorithm has been integrated into semantic based answer quality prediction models, and appears to significantly improve all testsets’ baseline results, in the best case scenario. Upto 86% accuracy and 84% F-measure are scored by these models. Also, answer quality classifiers yeild upto 100% recall and 98% precision. Following the transformation of joint syntactic-punctuation information into the identified expertise dimensions (e.g., authoritativeness, analytical, descriptiveness, completeness) that formally define answerer performance, extensive algorithm analyses have been carried on almost 142,246 answers extracted from diverse sets of 13 different QA forums. The analyses prove that incorporating competence information into answer quality models certainly leads to nearly perfect models. Moreover, we found out that t he syntactic based algorithm with semantic based models yield better results than answer quality prediction modles built on shallow linguistic or meta-features presented in related works.
  •  
6.
  • Woldemariam, Yonas Demeke (författare)
  • Assessing users’ reputation from syntactic and semantic information in community question answering
  • 2020
  • Ingår i: LREC 2020 Conference Proceedings. - Paris : European Language Resources Association (ELRA). - 9791095546344 ; , s. 5383-5391
  • Konferensbidrag (refereegranskat)abstract
    • Textual content is the most significant as well as substantially the big part of CQA forums. Users gain reputation for contributing such content. Although linguistic quality is the very essence of textual information, that does not seem to be considered in estimating users’ reputation. As existing users’ reputation systems seem to solely rely on vote counting, adding that bit of linguistic information surely improves their quality. In this study, we investigate the relationship between users’ reputation and linguistic features extracted from their associated answers content. And we build statistical models on a Stack Overflow dataset that learn reputation from complex syntactic and semantic structures of such content. The resulting models reveal how users’ writing styles in answering questions play important roles in building reputation points. In our experiments, extracting answers from systematically selected users followed by linguistic features annotation and models building. The models are evaluated on in-domain (e.g., Server Fault, Super User) and out-domain (e.g., English, Maths) datasets. We found out that the selected linguistic features have quite significant influences over reputation scores. In the best case scenario, the selected linguistic feature set could explain 80% variation in reputation scores with the prediction error of 3%. The per- formance results obtained from the baseline models have been significantly improved by adding syntactic and punctuation marks features.
  •  
7.
  • Woldemariam, Yonas Demeke (författare)
  • Expertise detection in crowdsourcing forums using the composition of latent topics and joint syntactic–semantic cues
  • 2021
  • Ingår i: SN Computer Science. - Singapore : Springer. - 2662-995X .- 2661-8907. ; 2:6
  • Tidskriftsartikel (refereegranskat)abstract
    • We develop an NLP method for inferring potential contributors among multitude of users within crowdsourcing forums (CSFs). The method basically provides a way to predict expertise from their structures (syntax–semantic patterns) when crowdsourced votes are unavailable. It primarily deals with tackling core adverse conditions, which hinder the identification of crowds’ expertise levels, and standardization of measuring linguistic quality of crowdsourced text. To solve the former, an expertise estimation and linguistic feature annotation algorithm is developed. To approach the later, a comprehensive linguistic characterization of crowdsourced text, along with extensive joint syntax–punctuation analyses, have been carried out. The entire corpora are comprised of approximately 8 different domains, 3 million and 50,000 sentences, and 32 million and 90,000 words, contributed by a crowd of 50,000 users. The analyses revealed six major linguistic patterns, identified on the basis of ordered lists of structural (syntactic) categories, learned from grammatical constructions, practiced by major groups of experts. In addition, nine different text-oriented expertise dimensions are identified, as crucial steps towards establishing standard linguistic-based expertise-framework for most CSFs. Potentially, the resulting framework simplifies the measurement of crowds’ proficiency, in those particular forums, where crowds’ tasks (e.g., answering questions, technically discerning deep features within images of galaxies for classifying them into certain categories) are intimately connected with their writing (e.g., describing answers illustratively, expressing complex phenomena observed in classified images). Moreover, wide varieties of linguistic annotations: latent topic annotations, named entities, syntactic and punctuation annotations, semantic and character set annotations, word and character n-grams (n = 2 and 3) annotations, are extracted. That is for building baseline and enhanced versions of expertise models (about 20 different models built). The successive achievements of enhancing baseline models, with iteratively adding linguistic feature annotations in a two-stage enhancement process, indicate the adaptability of the learned models.
  •  
8.
  • Woldemariam, Yonas Demeke, 1985- (författare)
  • NLP methods for improving user rating systems in crowdsourcing forums and speech recognition of less resourced languages
  • 2024
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • We develop NLP and ASR methods (e.g., algorithms, architectures) for solving these problems: biases induced by user rating, ranking, recommendation and search engine algorithms, computational inefficiencies related with conventional syntactic-semantic parsing algorithms, extensive linguistic resource requirements imposed by traditional ASR methods, interoperability issues faced by NLP and ASR components within cross media analysis and audio-video content searchability problems over the Web. User rating systems (URSs) in crowdsourcing forums (CSFs) (e.g., QA) completely rely on solely voting schemes, and fail to incorporate linguistic quality and user competence information. Such potential failure affects the trustworthiness of answers over the Web as search engines are likely biased towards popular (high-voted) answers. That also contagiously affects the quality of the entire QA platforms as other components depend on the accuracy of the underlying URSs. On the other hand, conventional ASR methods present two major challenges: a failure of acoustic models to work within collaborative environments, as these methods only help build models limited to operate in isolation, and a resource related challenge. Significant contributions have been made in our thesis, published on prestigious AI, NLP and ASR venues, and received over 90 citations of our 9 papers. The proposed approaches potentially transform voting based rating to linguistic quality based rating, and shallow linguistic (meta-data) feature based answer quality predictions to deep syntactic-semantic and user competence based, and also single machine sequential fashion syntactic parsing to parallel and distributed cloud based parsing, meta-data based querying of spoken documents to full text querying and searching, as well as sentiment and competence based querying of textual content.Theoretically, we advance the understanding of the relationships between author text and their associated proficiency in performing certain tasks through successive research works to discover the rules governing the conjectured link between them. Also, new bag of word approaches (based on latent topic modeling, syntactic categories and dependency relations) have been proposed. These approaches yield significant accuracy gains over conventional TF-IDF (term frequency–inverse document frequency) based models, and reduce domain dependencies as they potentially capture structural and topical information. 
  •  
9.
  • Woldemariam, Yonas Demeke, et al. (författare)
  • Predicting User Competence from Linguistic Data
  • 2017
  • Ingår i: Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017). - : Jadavpur University. ; , s. 476-484
  • Konferensbidrag (refereegranskat)abstract
    • We investigate the problem of predicting the competence of users of the crowd-sourcing platform Zooniverse by analyzing their chat texts. Zooniverse is an online platform where objects of different types are displayed to volunteer users to classify. Our research focuses on the Zoonivers Galaxy Zoo project, where users classify the images of galaxies and discuss their classifications in text. We apply natural language processing methods to extract linguistic features including syntactic categories, bag-of-words, and punctuation marks. We trained three supervised machine-learning classifiers on the resulting dataset: k-nearest neighbors, decision trees (with gradient boosting) and naive Bayes. They are evaluated (regarding accuracy and F-measure) with two different but related domain datasets. The performance of the classifiers varies across the feature set configurations designed during the training phase. A challenging part of this research is to compute the competence of the users without ground truth data available. We implemented a tool that estimates the proficiency of users and annotates their text with computed competence. Our evaluation results show that the trained classifier models give results that are significantly better than chance and can be deployed for other crowd-sourcing projects as well. 
  •  
10.
  • Woldemariam, Yonas Demeke (författare)
  • Transfer learning for less-resourced semitic languages speech recognition : the case of Amharic
  • 2020
  • Ingår i: Proceedings of the 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020). - 9791095546351 ; , s. 61-69
  • Konferensbidrag (refereegranskat)abstract
    • While building automatic speech recognition (ASR) requires a large amount of speech and text data, the problem gets worse forless-resourced languages. In this paper, we investigate a model adaptation method, namely transfer learning for a less-resourced Semiticlanguage i.e., Amharic, to solve resource scarcity problems in speech recognition development and improve the Amharic ASR model. Inour experiments, we transfer acoustic models trained on two different source languages (English and Mandarin) to Amharic using verylimited resources. The experimental results show that a significant WER (Word Error Rate) reduction has been achieved by transferringthe hidden layers of the trained source languages neural networks. In the best case scenario, the Amharic ASR model adapted fromEnglish yields the best WER reduction from 38.72% to 24.50% (an improvement of 14.22% absolute). Adapting the Mandarin modelimproves the baseline Amharic model with a WER reduction of 10.25% (absolute). Our analysis also reveals that, the speech recognitionperformance of the adapted acoustic model is highly influenced by the relatedness (in a relative sense) between the source and thetarget languages than other considered factors (e.g. the quality of source models). Furthermore, other Semitic as well as Afro-Asiaticlanguages could benefit from the methodology presented in this study.
  •  
11.
  • Abbafati, Cristiana, et al. (författare)
  • 2020
  • Tidskriftsartikel (refereegranskat)
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-11 av 11
Typ av publikation
konferensbidrag (6)
tidskriftsartikel (3)
rapport (1)
doktorsavhandling (1)
Typ av innehåll
refereegranskat (9)
övrigt vetenskapligt/konstnärligt (2)
Författare/redaktör
Johansson, Lars (1)
Sulo, Gerhard (1)
Hassankhani, Hadi (1)
Liu, Yang (1)
Ali, Muhammad (1)
Mitchell, Philip B (1)
visa fler...
McKee, Martin (1)
Madotto, Fabiana (1)
Bensch, Suna (1)
Abolhassani, Hassan (1)
Rezaei, Nima (1)
Castro, Franz (1)
Koul, Parvaiz A. (1)
Weiss, Daniel J. (1)
Ackerman, Ilana N. (1)
Brenner, Hermann (1)
Ferrara, Giannina (1)
Salama, Joseph S. (1)
Mullany, Erin C. (1)
Abbafati, Cristiana (1)
Bensenor, Isabela M. (1)
Bernabe, Eduardo (1)
Carrero, Juan J. (1)
Cercy, Kelly M. (1)
Zaki, Maysaa El Saye ... (1)
Esteghamati, Alireza (1)
Esteghamati, Sadaf (1)
Fanzo, Jessica (1)
Farzadfar, Farshad (1)
Foigt, Nataliya A. (1)
Grosso, Giuseppe (1)
Islami, Farhad (1)
James, Spencer L. (1)
Khader, Yousef Saleh (1)
Kimokoti, Ruth W. (1)
Kumar, G. Anil (1)
Lallukka, Tea (1)
Lotufo, Paulo A. (1)
Mendoza, Walter (1)
Nagel, Gabriele (1)
Nguyen, Cuong Tat (1)
Nixon, Molly R. (1)
Ong, Kanyin L. (1)
Pereira, David M. (1)
Rivera, Juan A. (1)
Sanchez-Pimienta, Ta ... (1)
Shin, Min-Jeong (1)
Thrift, Amanda G. (1)
Tran, Bach Xuan (1)
Uthman, Olalekan A. (1)
visa färre...
Lärosäte
Umeå universitet (9)
Göteborgs universitet (1)
Uppsala universitet (1)
Karolinska Institutet (1)
Högskolan Dalarna (1)
Språk
Engelska (11)
Forskningsämne (UKÄ/SCB)
Naturvetenskap (9)
Medicin och hälsovetenskap (1)
Samhällsvetenskap (1)

År

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy