Sökning: WFRF:(Nivre Joakim 1962 ) >
Overview of the CLE...
Overview of the CLEF-2024 Eloquent Lab : Task 2 on HalluciGen
-
- Dürlich, Luise (författare)
- RISE,Datavetenskap
-
- Gogoulou, Evangelia (författare)
- RISE,Datavetenskap
-
- Guillou, Liane (författare)
- University of Edinburgh, UK
-
visa fler...
-
- Nivre, Joakim, 1962- (författare)
- RISE,Datavetenskap
-
- Zahra, Shorouq (författare)
- RISE,Datavetenskap
-
visa färre...
-
(creator_code:org_t)
- CEUR-WS, 2024
- 2024
- Engelska.
-
Ingår i: <em>CEUR Workshop Proceedings</em>. - : CEUR-WS. ; , s. 691-702
- Relaterad länk:
-
https://ceur-ws.org/...
-
visa fler...
-
https://ri.diva-port... (primary) (Raw object)
-
https://urn.kb.se/re...
-
visa färre...
Abstract
Ämnesord
Stäng
- In the HalluciGen task we aim to discover whether LLMs have an internal representation of hallucination. Specifically, we investigate whether LLMs can be used to both generate and detect hallucinated content. In the cross-model evaluation setting we take this a step further and explore the viability of using an LLM to evaluate output produced by another LLM. We include generation, detection, and cross-model evaluation steps for two scenarios: paraphrase and machine translation. Overall we find that performance of the baselines and submitted systems is highly variable, however initial results are promising and lessons learned from this year’s task will provide a solid foundation for future iterations of the task. In particular, we highlight that human validation of generated output is ideally necessary to ensure the robustness of the cross-model evaluation results. We aim to address this challenge in future iterations of HalluciGen.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences (hsv//eng)
Nyckelord
- Computational linguistics; Computer aided language translation; Modeling languages; Cross model; Detection models; Evaluation; Generative language model; Hallucination; Internal representation; Language model; Machine translations; Model evaluation; Performance; Machine translation
Publikations- och innehållstyp
- ref (ämneskategori)
- kon (ämneskategori)