Towards a Model of General Text Complexity for Swedish

↓ Direkt till sidans innehåll
↓ Direkt till sidans sekundära innehåll (sidomenyn)

Search: onr:"swepub:oai:DiVA.org:liu-152495" > Towards a Model of ...

1 of 1
Previous record
Next record
To hitlist

Towards a Model of General Text Complexity for Swedish

Falkenjack, Johan, 1986- (author): Linköpings universitet,Interaktiva och kognitiva system,Tekniska fakulteten,NLPLAB

Jönsson, Arne, 1955- (thesis advisor): Linköpings universitet,Interaktiva och kognitiva system,Tekniska fakulteten

Östling, Robert, Ph.D. (opponent): Stockholms Universitet, Stockholm, Sweden

(creator_code:org_t)

ISBN 9789176851555
Linköping : Linköping University Electronic Press, 2018
English 103 s.

Related links:: https://doi.org/10.3...; show more...; https://liu.diva-por... (Preview); https://liu.diva-por... (primary) (Raw object); https://urn.kb.se/re...; https://doi.org/10.3...; show less...

Licentiate thesis (other academic/artistic)

Abstract Subject headings

In an increasingly networked world, where the amount of written information is growing at a rate never before seen, the ability to read and absorb written information is of utmost importance for anything but a superficial understanding of life's complexities. That is an example of a sentence which is not very easy to read. It can be said to have a relatively high degree of text complexity. Nevertheless, the sentence is also true. It is important to be able to read and understand written materials. While not everyone might have a job where they have to read a lot, access to written material is necessary in order to participate in modern society. Most information, from news reporting, to medical information, to governmental information, come primarily in a written form.But what makes the sentence at the start of this abstract so complex? We can probably all agree that the length is part of it. But then what? Researches in the field of readability and text complexity analysis have been studying this question for almost 100 years. That research has over time come to include many computational and data driven methods within the field of computational linguistics.This thesis cover some of my contributions to this field of research, though with a main focus on Swedish rather than English text. It aims to explore two primary questions (1) Which linguistic features are most important when assessing text complexity in Swedish? and (2) How can we deal with the problem of data sparsity with regards to complexity annotated texts in Swedish?The first issue is tackled by exploring the task of identifying easy-to-read ("lättläst") text using classification with Support Vector Machines. A large set of linguistic features is evaluated with regards to predictive performance and is shown to separate easy-to-read texts from regular texts with a very high accuracy. Meanwhile, using a genetic algorithm for variable selection, we find that almost the same accuracy can be reached with only 8 features. This implies that this classification problem is not very hard and that results might not generalize to comparing less easy-to-read texts.This, in turn, brings us to the second question. Except for easy-to-read labeled texts, the data with text complexity annotations is very sparse. It consist of multiple small corpora using different scales to label documents. To deal with this problem, we propose a novel statistical model. The model belongs to the larger family of Probit models and is implemented in a Bayesian fashion and estimated using a Gibbs sampler based on extending a well established Gibbs sampler for the Ordered Probit model. This model is evaluated using both simulated and real world readability data with very promising results.

Find in a library

Towards a Model of General Text Complexity for Swedish (Search the publication in LIBRIS)

To the university's database

1 of 1
Previous record
Next record
To hitlist

Find more in SwePub

By the author/editor: Falkenjack, Joha ...; Jönsson, Arne, 1 ...; Östling, Robert, ...

About the subject

NATURAL SCIENCES: NATURAL SCIENCES; and Computer and Inf ...; and Computer Science ...

NATURAL SCIENCES: NATURAL SCIENCES; and Computer and Inf ...; and Language Technol ...

NATURAL SCIENCES: NATURAL SCIENCES; and Mathematics; and Probability Theo ...

HUMANITIES: HUMANITIES; and Languages and Li ...; and Specific Languag ...

By the university: Linköping University

Search outside SwePub

Extend your search to:: Google; Google Book Search; Google Scholar

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

LIBRIS.kb.se

Towards a Model of General Text Complexity for Swedish

Subject headings

Publication and Content Type

Find in a library

To the university's database

Find more in SwePub

Search outside SwePub