Sökning: onr:"swepub:oai:gup.ub.gu.se/252492" >
Arabicized and Roma...
Arabicized and Romanized Berber Automatic Identification
-
- Adouane, Wafia, 1985 (författare)
- Gothenburg University,Göteborgs universitet,Institutionen för filosofi, lingvistik och vetenskapsteori,Department of Philosophy, Linguistics and Theory of Science
-
Semmar, Nasredine (författare)
-
- Johansson, Richard, 1975 (författare)
- Gothenburg University,Göteborgs universitet,Institutionen för data- och informationsteknik (GU),Department of Computer Science and Engineering (GU)
-
(creator_code:org_t)
- Morocco : IRCAM, 2016
- 2016
- Engelska.
-
Ingår i: Proceedings of TICAM 2016. - Morocco : IRCAM.
- Relaterad länk:
-
https://gup.ub.gu.se...
Abstract
Ämnesord
Stäng
- We present an automatic language identification tool for both Arabicized Berber (Berber written in the Arabic script) and Romanized Berber (Berber written in the Latin script). The focus is on short texts (social media content). We use supervised machine learning method with character and word-based n-gram models as features. We also describe the corpora used in this paper. For both Arabicized and Romanized Berber, character-based 5-grams score the best giving an F-score of 99.50%.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap -- Språkteknologi (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Language Technology (hsv//eng)
Publikations- och innehållstyp
- ref (ämneskategori)
- kon (ämneskategori)