SwePub
Sök i LIBRIS databas

  Utökad sökning

id:"swepub:oai:gup.ub.gu.se/293967"
 

Sökning: id:"swepub:oai:gup.ub.gu.se/293967" > Natural Language Pr...

Natural Language Processing for Low-resourced Code-switched Colloquial Languages – The Case of Algerian Language

Adouane, Wafia, 1985 (författare)
Gothenburg University,Göteborgs universitet,Institutionen för filosofi, lingvistik och vetenskapsteori,Department of Philosophy, Linguistics and Theory of Science
 (creator_code:org_t)
ISBN 9789178339587
Göteborg : Göteborgs universitet, 2020
Engelska.
  • Doktorsavhandling (övrigt vetenskapligt/konstnärligt)
Abstract Ämnesord
Stäng  
  • In this thesis we explore to what extent deep neural networks (DNNs), trained end-to-end, can be used to perform natural language processing tasks for code-switched colloquial languages lacking both large automated data and processing tools, for instance tokenisers, morpho-syntactic and semantic parsers, etc. We opt for an end-to-end learning approach because this kind of data is hard to control due to its high orthographic and linguistic variability. This variability makes it unrealistic to either find a dataset that exhaustively covers all the possible cases that could be used to devise processing tools or to build equivalent rule-based tools from the bottom up. Moreover, all our models are language-independent and do not require access to additional resources, hence we hope that they will be used with other languages or language varieties with similar settings. We deal with the case of user-generated textual data written in Algerian language as naturally produced in social media. We experiment with five natural language processing tasks, namely Code-switch Detection, Semantic Textual Similarity, Spelling Normalisation and Correction, Sentiment Analysis, and Named Entity Recognition. For each task, we created a dataset from user-generated data reflecting the real use of the language. Our experimental results in various setups indicate that end-to-end DNNs combined with character-level representation of the data are promising. Further experiments with advanced models, such as Transformer-based models, could lead to even better results. Completely solving the challenge of code-switched colloquial languages is beyond the scope of this experimental work. Even so, we believe that this work will extend the utility of DNNs trained end-to-end to low-resource settings. Furthermore, the results of our experiments can be used as a baseline for future research.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Språkteknologi (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Language Technology (hsv//eng)

Nyckelord

Natural language processing
Deep neural networks
Low-resourced language
Colloquial language
Code-switch
Dialectal Arabic
User-generated data
Non-standardised orthography
Algerian language

Publikations- och innehållstyp

vet (ämneskategori)
dok (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Hitta mer i SwePub

Av författaren/redakt...
Adouane, Wafia, ...
Om ämnet
NATURVETENSKAP
NATURVETENSKAP
och Data och informa ...
och Språkteknologi
Av lärosätet
Göteborgs universitet

Sök utanför SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy