SwePub
Sök i LIBRIS databas

  Extended search

id:"swepub:oai:DiVA.org:lnu-116173"
 

Search: id:"swepub:oai:DiVA.org:lnu-116173" > The impact of synth...

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

The impact of synthetic text generation for sentiment analysis using GAN based models

Imran, Ali Shariq (author)
Norwegian University of Science and Technology, Norway
Yang, Ru (author)
Norwegian University of Science and Technology, Norway
Kastrati, Zenun, 1984- (author)
Linnéuniversitetet,Institutionen för informatik (IK)
show more...
Daudpota, Sher Muhammad (author)
Sukkur IBA University, Pakistan
Shaikh, Sarang (author)
Norwegian University of Science and Technology, Norway
show less...
 (creator_code:org_t)
Elsevier, 2022
2022
English.
In: Egyptian Informatics Journal. - : Elsevier. - 1110-8665. ; 23:3, s. 547-557
  • Journal article (peer-reviewed)
Abstract Subject headings
Close  
  • Data imbalance in datasets is a common issue where the number of instances in one or more categories far exceeds the others, so is the case with the educational domain. Collecting feedback on a course on a large scale and the lack of publicly available datasets in this domain limits models' performance, especially for deep neural network based models which are data hungry. A model trained on such an imbalanced dataset would naturally favor the majority class. However, the minority class could be critical for decision-making in prediction systems, and therefore it is usually desirable to train a model with equally high class-level accuracy. This paper addresses the data imbalance issue for the sentiment analysis of users' opinions task on two educational feedback datasets utilizing synthetic text generation deep learning models. Two state-of-the-art text generation GAN models namely CatGAN and SentiGAN, are employed for synthesizing text used to balance the highly imbalanced datasets in this study. Particular emphasis is given to the diversity of synthetically generated samples for populating minority classes. Experimental results on highly imbalanced datasets show significant improvement in models' performance on CR23K and CR100K after balancing with synthetic data for the sentiment classification task.

Subject headings

NATURVETENSKAP  -- Data- och informationsvetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences (hsv//eng)

Keyword

Text generation
Sentiment analysis
SentiGAN
CatGAN
Deep learning
Language modeling
Machine learning
GANs
Generative adversarial networks
Data imbalance
Informatik
Information Systems

Publication and Content Type

ref (subject category)
art (subject category)

Find in a library

To the university's database

  • 1 of 1
  • Previous record
  • Next record
  •    To hitlist

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view