SwePub
Sök i LIBRIS databas

  Utökad sökning

id:"swepub:oai:DiVA.org:lnu-106986"
 

Sökning: id:"swepub:oai:DiVA.org:lnu-106986" > The Impact of Trans...

The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing

Ghafoor, Abdul (författare)
Sukkur IBA University, Pakistan
Imran, Ali Shariq (författare)
Norwegian University of Science and Technology (NTNU), Norway
Daudpota, Sher Muhammad (författare)
Sukkur IBA University, Pakistan
visa fler...
Kastrati, Zenun, 1984- (författare)
Linnéuniversitetet,Institutionen för informatik (IK)
Soomro, Abdullah (författare)
Sukkur IBA University, Pakistan
Batra, Rakhi (författare)
Sukkur IBA University, Pakistan
Wani, Mudasir Ahmad (författare)
Norwegian University of Science and Technology, Norway
visa färre...
 (creator_code:org_t)
IEEE, 2021
2021
Engelska.
Ingår i: IEEE Access. - : IEEE. - 2169-3536. ; 9, s. 124478-124490
  • Tidskriftsartikel (refereegranskat)
Abstract Ämnesord
Stäng  
  • Urdu is still considered a low-resource language despite being ranked as world’s 10th most spoken language with nearly 230 million speakers. The scarcity of benchmark datasets in low-resource languages has led researchers to utilize more ingenious techniques to curb the issue. One such option widely adopted is to use language translation services to replicate existing datasets from resource-rich languages such as English to low-resource languages, such as Urdu. For most natural language processing tasks, including polarity assessment, words translated via Google translator from one language to another often change the meaning. It results in a polarity shift causing the system’s performance degradation, particularly for sentiment classification and emotion detection tasks. This study evaluates the effect of translation on the sentiment classification task from a resource-rich language to a low-resource language. It identifies and enlists words causing polarity shift into five distinct categories. It further finds the correlation between the language with similar roots. Our study shows 2-3 percentage points performance degradation in sentiment classification due to polarity shift as a result of translation from resource-rich languages to low-resource languages.

Ämnesord

NATURVETENSKAP  -- Data- och informationsvetenskap -- Systemvetenskap, informationssystem och informatik (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences -- Information Systems (hsv//eng)

Nyckelord

Multilingual text processing
sentiment classification
polarity assessment
low resource language
language translation
BiLSTM
Conv1D
English
Urdu
German
Hindi.
Informatik
Information Systems

Publikations- och innehållstyp

ref (ämneskategori)
art (ämneskategori)

Hitta via bibliotek

Till lärosätets databas

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy