SwePub
Sök i LIBRIS databas

  Extended search

onr:"swepub:oai:gup.ub.gu.se/246765"
 

Search: onr:"swepub:oai:gup.ub.gu.se/246765" > Automatic Detection...

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Automatic Detection of Arabicized Berber and Arabic Varieties

Adouane, Wafia, 1985 (author)
Gothenburg University,Göteborgs universitet,Institutionen för filosofi, lingvistik och vetenskapsteori,Department of Philosophy, Linguistics and Theory of Science
Semmar, Nasredine (author)
Johansson, Richard, 1975 (author)
Gothenburg University,Göteborgs universitet,Institutionen för data- och informationsteknik (GU),Department of Computer Science and Engineering (GU)
show more...
Bobicev, Victoria (author)
show less...
 (creator_code:org_t)
2016
2016
English.
In: Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects; 63–72; December 12; Osaka, Japan.
  • Conference paper (peer-reviewed)
Abstract Subject headings
Close  
  • Automatic Language Identification (ALI) is the detection of the natural language of an input text by a machine. It is the first necessary step to do any language-dependent natural language processing task. Various methods have been successfully applied to a wide range of languages, and the state-of-the-art automatic language identifiers are mainly based on character n-gram models trained on huge corpora. However, there are many languages which are not yet automatically processed, for instance minority and informal languages. Many of these languages are only spoken and do not exist in a written format. Social media platforms and new technologies have facilitated the emergence of written format for these spoken languages based on pronunciation. The latter are not well represented on the Web, commonly referred to as under-resourced languages, and the current available ALI tools fail to properly recognize them. In this paper, we revisit the problem of ALI with the focus on Arabicized Berber and dialectal Arabic short texts. We introduce new resources and evaluate the existing methods. The results show that machine learning models combined with lexicons are well suited for detecting Arabicized Berber and different Arabic varieties and distinguishing between them, giving a macro-average F-score of 92.94%.

Subject headings

NATURVETENSKAP  -- Data- och informationsvetenskap -- Språkteknologi (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Language Technology (hsv//eng)
HUMANIORA  -- Språk och litteratur -- Studier av enskilda språk (hsv//swe)
HUMANITIES  -- Languages and Literature -- Specific Languages (hsv//eng)

Keyword

natural language processing
machine learning
language classification
Arabic
Berber
dialects

Publication and Content Type

ref (subject category)
kon (subject category)

To the university's database

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Find more in SwePub

By the author/editor
Adouane, Wafia, ...
Semmar, Nasredin ...
Johansson, Richa ...
Bobicev, Victori ...
About the subject
NATURAL SCIENCES
NATURAL SCIENCES
and Computer and Inf ...
and Language Technol ...
HUMANITIES
HUMANITIES
and Languages and Li ...
and Specific Languag ...
Articles in the publication
By the university
University of Gothenburg

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view