BRANCH-GAN: IMPROVING TEXT GENERATION WITH (NOT SO) LARGE LANGUAGE MODELS

↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Search: WFRF:(Nivre Joakim 1962 ) > (2020-2024) > BRANCH-GAN :

BRANCH-GAN : IMPROVING TEXT GENERATION WITH (NOT SO) LARGE LANGUAGE MODELS

Carlsson, Fredrik (author): RISE,Elektrifiering och pålitlighet

Broberg, Johan (author): RISE

Hillbom, Erik (author): RISE

Sahlgren, Magnus (author): AI Sweden, Sweden

Nivre, Joakim, 1962- (author): RISE,Datavetenskap

show less...

(creator_code:org_t)

International Conference on Learning Representations, ICLR, 2024
2024
English.

Related links:: https://urn.kb.se/re...

Conference paper (peer-reviewed)

Abstract Subject headings

The current advancements in open domain text generation have been spearheaded by Transformer-based large language models. Leveraging efficient parallelization and vast training datasets, these models achieve unparalleled text generation capabilities. Even so, current models are known to suffer from deficiencies such as repetitive texts, looping issues, and lack of robustness. While adversarial training through generative adversarial networks (GAN) is a proposed solution, earlier research in this direction has predominantly focused on older architectures, or narrow tasks. As a result, this approach is not yet compatible with modern language models for open-ended text generation, leading to diminished interest within the broader research community. We propose a computationally efficient GAN approach for sequential data that utilizes the parallelization capabilities of Transformer models. Our method revolves around generating multiple branching sequences from each training sample, while also incorporating the typical next-step prediction loss on the original data. In this way, we achieve a dense reward and loss signal for both the generator and the discriminator, resulting in a stable training dynamic. We apply our training method to pre-trained language models, using data from their original training set but less than 0.01% of the available data. A comprehensive human evaluation shows that our method significantly improves the quality of texts generated by the model while avoiding the previously reported sparsity problems of GAN approaches. Even our smaller models outperform larger original baseline models with more than 16 times the number of parameters. Finally, we corroborate previous claims that perplexity on held-out data is not a sufficient metric for measuring the quality of generated texts.

To the university's database

Find more in SwePub

By the author/editor: Carlsson, Fredri ...; Broberg, Johan; Hillbom, Erik; Sahlgren, Magnus; Nivre, Joakim, 1 ...

About the subject

NATURAL SCIENCES: NATURAL SCIENCES; and Computer and Inf ...

By the university: RISE

Search outside SwePub

Extend your search to:: Google; Google Book Search; Google Scholar

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

LIBRIS.kb.se

BRANCH-GAN : IMPROVING TEXT GENERATION WITH (NOT SO) LARGE LANGUAGE MODELS

Subject headings

Keyword

Publication and Content Type

To the university's database

Find more in SwePub

Search outside SwePub