SwePub
Sök i LIBRIS databas

  Extended search

WFRF:(Johansson Richard 1975)
 

Search: WFRF:(Johansson Richard 1975) > (2015-2019) > Romanized Berber an...

  • Adouane, Wafia,1985Gothenburg University,Göteborgs universitet,Institutionen för filosofi, lingvistik och vetenskapsteori,Department of Philosophy, Linguistics and Theory of Science (author)

Romanized Berber and Romanized Arabic Automatic Language Identification Using Machine Learning

  • Article/chapterEnglish2016

Publisher, publication year, extent ...

  • Association for Computational Linguistics,2016

Numbers

  • LIBRIS-ID:oai:gup.ub.gu.se/246849
  • https://gup.ub.gu.se/publication/246849URI

Supplementary language notes

  • Language:English

Part of subdatabase

Classification

  • Subject category:ref swepub-contenttype
  • Subject category:kon swepub-publicationtype

Notes

  • The identification of the language of text/speech input is the first step to be able to properly do any language-dependent natural language processing. The task is called Automatic Language Identification (ALI). Being a well-studied field since early 1960’s, various methods have been applied to many standard languages. The ALI standard methods require datasets for training and use character/word-based n-gram models. However, social media and new technologies have contributed to the rise of informal and minority languages on the Web. The state-of-the-art automatic language identifiers fail to properly identify many of them. Romanized Arabic (RA) and Romanized Berber (RB) are cases of these informal languages which are under-resourced. The goal of this paper is twofold: detect RA and RB, at a document level, as separate languages and distinguish between them as they coexist in North Africa. We consider the task as a classification problem and use supervised machine learning to solve it. For both languages, character-based 5-grams combined with additional lexicons score the best, F-score of 99.75% and 97.77% for RB and RA respectively.

Subject headings and genre

Added entries (persons, corporate bodies, meetings, titles ...)

  • Semmar, Nasredine (author)
  • Johansson, Richard,1975Gothenburg University,Göteborgs universitet,Institutionen för data- och informationsteknik (GU),Department of Computer Science and Engineering (GU)(Swepub:gu)xjohri (author)
  • Göteborgs universitetInstitutionen för filosofi, lingvistik och vetenskapsteori (creator_code:org_t)

Related titles

  • In:Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects; 53–61; December 12, 2016 ; Osaka, Japan: Association for Computational Linguistics0736-587X

Internet link

Find in a library

To the university's database

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view