Sökning: onr:"swepub:oai:gup.ub.gu.se/246849" >
Romanized Berber an...
-
Adouane, Wafia,1985Gothenburg University,Göteborgs universitet,Institutionen för filosofi, lingvistik och vetenskapsteori,Department of Philosophy, Linguistics and Theory of Science
(författare)
Romanized Berber and Romanized Arabic Automatic Language Identification Using Machine Learning
- Artikel/kapitelEngelska2016
Förlag, utgivningsår, omfång ...
-
Association for Computational Linguistics,2016
Nummerbeteckningar
-
LIBRIS-ID:oai:gup.ub.gu.se/246849
-
https://gup.ub.gu.se/publication/246849URI
Kompletterande språkuppgifter
Ingår i deldatabas
Klassifikation
-
Ämneskategori:ref swepub-contenttype
-
Ämneskategori:kon swepub-publicationtype
Anmärkningar
-
The identification of the language of text/speech input is the first step to be able to properly do any language-dependent natural language processing. The task is called Automatic Language Identification (ALI). Being a well-studied field since early 1960’s, various methods have been applied to many standard languages. The ALI standard methods require datasets for training and use character/word-based n-gram models. However, social media and new technologies have contributed to the rise of informal and minority languages on the Web. The state-of-the-art automatic language identifiers fail to properly identify many of them. Romanized Arabic (RA) and Romanized Berber (RB) are cases of these informal languages which are under-resourced. The goal of this paper is twofold: detect RA and RB, at a document level, as separate languages and distinguish between them as they coexist in North Africa. We consider the task as a classification problem and use supervised machine learning to solve it. For both languages, character-based 5-grams combined with additional lexicons score the best, F-score of 99.75% and 97.77% for RB and RA respectively.
Ämnesord och genrebeteckningar
Biuppslag (personer, institutioner, konferenser, titlar ...)
-
Semmar, Nasredine
(författare)
-
Johansson, Richard,1975Gothenburg University,Göteborgs universitet,Institutionen för data- och informationsteknik (GU),Department of Computer Science and Engineering (GU)(Swepub:gu)xjohri
(författare)
-
Göteborgs universitetInstitutionen för filosofi, lingvistik och vetenskapsteori
(creator_code:org_t)
Sammanhörande titlar
-
Ingår i:Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects; 53–61; December 12, 2016 ; Osaka, Japan: Association for Computational Linguistics0736-587X
Internetlänk
Hitta via bibliotek
Till lärosätets databas