Sökning: onr:"swepub:oai:DiVA.org:liu-117547" >
Classifying easy-to...
Classifying easy-to-read texts without parsing
-
- Falkenjack, Johan, 1986- (författare)
- Linköpings universitet,Institutionen för datavetenskap,Tekniska högskolan
-
- Jönsson, Arne (författare)
- Linköpings universitet,Institutionen för datavetenskap,Tekniska högskolan
-
(creator_code:org_t)
- Association for Computational Linguistics, 2014
- 2014
- Engelska.
-
Ingår i: Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR). - : Association for Computational Linguistics. - 9781937284916 ; , s. 114-122
- Relaterad länk:
-
https://urn.kb.se/re...
Abstract
Ämnesord
Stäng
- Document classification using automated linguistic analysis and machine learning (ML) has been shown to be a viable road forward for readability assessment. The best models can be trained to decide if a text is easy to read or not with very high accuracy, e.g. a model using 117 parameters from shallow, lexical, morphological and syntactic analyses achieves 98,9% accuracy. In this paper we compare models created by parameter optimization over subsets of that total model to find out to which extent different high-performing models tend to consist of the same parameters and if it is possible to find models that only use features not requiring parsing. We used a genetic algorithm to systematically optimize parameter sets of fixed sizes using accuracy of a Support Vector Machine classi- fier as fitness function. Our results show that it is possible to find models almost as good as the currently best models while omitting parsing based features.
Ämnesord
- NATURVETENSKAP -- Data- och informationsvetenskap -- Språkteknologi (hsv//swe)
- NATURAL SCIENCES -- Computer and Information Sciences -- Language Technology (hsv//eng)
Nyckelord
- Readability
- Readability Assessment
- Genetic optimization
- Machine Learning
- Support Vector Machine
Publikations- och innehållstyp
- ref (ämneskategori)
- kon (ämneskategori)
Hitta via bibliotek
Till lärosätets databas