SwePub
Sök i LIBRIS databas

  Extended search

WFRF:(Nivre Joakim 1962 )
 

Search: WFRF:(Nivre Joakim 1962 ) > (2020-2024) > BRANCH-GAN :

BRANCH-GAN : IMPROVING TEXT GENERATION WITH (NOT SO) LARGE LANGUAGE MODELS

Carlsson, Fredrik (author)
RISE,Elektrifiering och pålitlighet
Broberg, Johan (author)
RISE
Hillbom, Erik (author)
RISE
show more...
Sahlgren, Magnus (author)
AI Sweden, Sweden
Nivre, Joakim, 1962- (author)
RISE,Datavetenskap
show less...
 (creator_code:org_t)
International Conference on Learning Representations, ICLR, 2024
2024
English.
  • Conference paper (peer-reviewed)
Abstract Subject headings
Close  
  • The current advancements in open domain text generation have been spearheaded by Transformer-based large language models. Leveraging efficient parallelization and vast training datasets, these models achieve unparalleled text generation capabilities. Even so, current models are known to suffer from deficiencies such as repetitive texts, looping issues, and lack of robustness. While adversarial training through generative adversarial networks (GAN) is a proposed solution, earlier research in this direction has predominantly focused on older architectures, or narrow tasks. As a result, this approach is not yet compatible with modern language models for open-ended text generation, leading to diminished interest within the broader research community. We propose a computationally efficient GAN approach for sequential data that utilizes the parallelization capabilities of Transformer models. Our method revolves around generating multiple branching sequences from each training sample, while also incorporating the typical next-step prediction loss on the original data. In this way, we achieve a dense reward and loss signal for both the generator and the discriminator, resulting in a stable training dynamic. We apply our training method to pre-trained language models, using data from their original training set but less than 0.01% of the available data. A comprehensive human evaluation shows that our method significantly improves the quality of texts generated by the model while avoiding the previously reported sparsity problems of GAN approaches. Even our smaller models outperform larger original baseline models with more than 16 times the number of parameters. Finally, we corroborate previous claims that perplexity on held-out data is not a sufficient metric for measuring the quality of generated texts. 

Subject headings

NATURVETENSKAP  -- Data- och informationsvetenskap (hsv//swe)
NATURAL SCIENCES  -- Computer and Information Sciences (hsv//eng)

Keyword

Computational linguistics; ’current; Computationally efficient; Current modeling; Language model; Modern languages; Parallelizations; Research communities; Sequential data; Text generations; Training dataset; Generative adversarial networks

Publication and Content Type

ref (subject category)
kon (subject category)

To the university's database

Search outside SwePub

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Close

Copy and save the link in order to return to this view