SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WFRF:(Falkenjack Johan) "

Sökning: WFRF:(Falkenjack Johan)

  • Resultat 1-10 av 12
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Falkenjack, Johan, et al. (författare)
  • An Exploratory Study on Genre Classification using Readability Features
  • 2016
  • Ingår i: The Sixth Swedish Language Technology Conference (SLTC).
  • Konferensbidrag (refereegranskat)abstract
    • We present a preliminary study that explores whether text features used for readability assessment are reliable genre-revealing features. We empirically explore the difference between genre and domain. We carry out two sets of experiments with both supervised and unsupervised methods. Findings on the Swedish national corpus (the SUC) show that readability cues are good indicators of genre variation.
  •  
2.
  • Falkenjack, Johan, 1986-, et al. (författare)
  • Classifying easy-to-read texts without parsing
  • 2014
  • Ingår i: Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR). - : Association for Computational Linguistics. - 9781937284916 ; , s. 114-122
  • Konferensbidrag (refereegranskat)abstract
    • Document classification using automated linguistic analysis and machine learning (ML) has been shown to be a viable road forward for readability assessment. The best models can be trained to decide if a text is easy to read or not with very high accuracy, e.g. a model using 117 parameters from shallow, lexical, morphological and syntactic analyses achieves 98,9% accuracy. In this paper we compare models created by parameter optimization over subsets of that total model to find out to which extent different high-performing models tend to consist of the same parameters and if it is possible to find models that only use features not requiring parsing. We used a genetic algorithm to systematically optimize parameter sets of fixed sizes using accuracy of a Support Vector Machine classi- fier as fitness function. Our results show that it is possible to find models almost as good as the currently best models while omitting parsing based features.
  •  
3.
  • Falkenjack, Johan, 1986-, et al. (författare)
  • Features indicating readability in Swedish text
  • 2013
  • Ingår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013). - Linköping. - 1650-3686 .- 1650-3740. - 9789175195896 ; :085, s. 27-40
  • Konferensbidrag (refereegranskat)abstract
    • Studies have shown that modern methods of readability assessment, using automated linguistic analysis and machine learning (ML), is a viable road forward for readability classification and ranking. In this paper we present a study of different levels of analysis and a large number of features and how they affect an ML-system’s accuracy when it comes to readability assessment. We test a large number of features proposed for different languages (mainly English) and evaluate their usefulness for readability assessment for Swedish as well as comparing their performance to that of established metrics. We find that the best performing features are language models based on part-of-speech and dependency type.
  •  
4.
  • Falkenjack, Johan, et al. (författare)
  • Implicit readability ranking using the latent variable of a Bayesian Probit model
  • 2016
  • Ingår i: CL4LC 2016 - Computational Linguistics for Linguistic Complexity. - : Uppsala universitet Humanistisk-samhällsvetenskapliga vetenskapsområdet. - 9784879747099 ; , s. 104-112
  • Konferensbidrag (refereegranskat)abstract
    • Data driven approaches to readability analysis for languages other than English has been plagued by a scarcity of suitable corpora.  Often, relevant corpora consist only of easy-to-read texts with no  rank  information  or  empirical  readability  scores,  making  only  binary  approaches,  such  as classification, applicable.  We propose a Bayesian, latent variable, approach to get the most out of these kinds of corpora. In this paper we present results on using such a model for readability ranking. The model is evaluated on a preliminary corpus of ranked student texts with encourag- ing results.  We also assess the model by showing that it performs readability classification on par with a state of the art classifier while at the same being transparent enough to allow more sophisticated interpretations.
  •  
5.
  • Falkenjack, Johan, 1986-, et al. (författare)
  • Services for Text Simplification and Analysis
  • 2017
  • Ingår i: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa. - : Linköping University Electronic Press. - 9789176856017 ; , s. 309-313
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • We present a language technology servicefor web editors’ work on making textseasier to understand, including tools fortext complexity analysis, text simplification and text summarization. We alsopresent a text analysis service focusing onmeasures of text complexity.
  •  
6.
  • Falkenjack, Johan, 1986- (författare)
  • Towards a Model of General Text Complexity for Swedish
  • 2018
  • Licentiatavhandling (övrigt vetenskapligt/konstnärligt)abstract
    • In an increasingly networked world, where the amount of written information is growing at a rate never before seen, the ability to read and absorb written information is of utmost importance for anything but a superficial understanding of life's complexities. That is an example of a sentence which is not very easy to read. It can be said to have a relatively high degree of text complexity. Nevertheless, the sentence is also true. It is important to be able to read and understand written materials. While not everyone might have a job where they have to read a lot, access to written material is necessary in order to participate in modern society. Most information, from news reporting, to medical information, to governmental information, come primarily in a written form.But what makes the sentence at the start of this abstract so complex? We can probably all agree that the length is part of it. But then what? Researches in the field of readability and text complexity analysis have been studying this question for almost 100 years. That research has over time come to include many computational and data driven methods within the field of computational linguistics.This thesis cover some of my contributions to this field of research, though with a main focus on Swedish rather than English text. It aims to explore two primary questions (1) Which linguistic features are most important when assessing text complexity in Swedish? and (2) How can we deal with the problem of data sparsity with regards to complexity annotated texts in Swedish?The first issue is tackled by exploring the task of identifying easy-to-read ("lättläst") text using classification with Support Vector Machines. A large set of linguistic features is evaluated with regards to predictive performance and is shown to separate easy-to-read texts from regular texts with a very high accuracy. Meanwhile, using a genetic algorithm for variable selection, we find that almost the same accuracy can be reached with only 8 features. This implies that this classification problem is not very hard and that results might not generalize to comparing less easy-to-read texts.This, in turn, brings us to the second question. Except for easy-to-read labeled texts, the data with text complexity annotations is very sparse. It consist of multiple small corpora using different scales to label documents. To deal with this problem, we propose a novel statistical model. The model belongs to the larger family of Probit models and is implemented in a Bayesian fashion and estimated using a Gibbs sampler based on extending a well established Gibbs sampler for the Ordered Probit model. This model is evaluated using both simulated and real world readability data with very promising results.
  •  
7.
  • Falkenjack, Johan, 1986-, et al. (författare)
  • Using the probability of readability to order Swedish texts
  • 2012
  • Ingår i: Proceedings of the Fourth Swedish Language Technology Conference. ; , s. 27-28
  • Konferensbidrag (refereegranskat)abstract
    • In this study we present a new approach to rank readability in Swedish texts based on lexical, morpho-syntactic and syntactic analysis of text as well as machine learning. The basic premise and theory is presented as well as a small experiment testing the feasibility, but not actual performance, of the approach. The experiment shows that it is possible to implement a system based on the approach, however, the actual performance of such a system has not been evaluated as the necessary resources for such an evaluation does not yet exist for Swedish. The experiment also shows that a classifier based on the aforementioned linguistic analysis, on our limited test set, outperforms classifiers based on established metrics used to assess readability such as LIX, OVIX and Nominal Ratio.
  •  
8.
  • Heimann Mühlenbock, Katarina, 1952, et al. (författare)
  • A multivariate model for classifying texts' readability
  • 2015
  • Ingår i: ACL Anthology - Proceedings of the 20th Nordic Conference of Computational Linguistics (NoDaLiDa-2015). May 11–13, 2015 in Vilnius, Lithuania.. - : Linköping University Electronic Press. - 1650-3740. - 9789175190983 ; 23, s. 257-261
  • Konferensbidrag (refereegranskat)abstract
    • We report on results from using the multi-variate readability model SVIT to classify texts into various levels. We investigate how the language features integrated in the SVIT model can be transformed to values on known criteria like vocabulary, grammatical fluency and propositional knowledge. Such text criteria, sensitive to content , readability and genre in combination with the profile of a student's reading ability form the base of individually adapted texts. The procedure of levelling texts into different stages of complexity is presented along with results from the first cycle of tests conducted on 8th grade students. The results show that SVIT can be used to classify texts into different complexity levels.
  •  
9.
  • Heimann Mühlenbock, Katarina, 1952, et al. (författare)
  • Studies on automatic assessment of students' reading ability
  • 2014
  • Ingår i: Proceedings of the Fifth Swedish Language Technology Conference. SLTC 2014..
  • Konferensbidrag (refereegranskat)abstract
    • We report results from ongoing research on developing sophisticated measures for assessing a student's reading ability and a tool for the student and teacher to create a profile of this ability. In the project we will also investigate how these measures can be transformed to values on known criteria like vocabulary, grammatical fluency and so forth, and how these can be used to analyse texts. Such text criteria, sensitive to content, readability and genre in combination with the profile of a student's reading ability will form the base to individually adapted texts. Techniques and tools will be developed for selecting suitable texts, automatic summarisation of texts and automatic transformation to easy-to-read Swedish.
  •  
10.
  • Jönsson, Simon, et al. (författare)
  • A Component based Approach to Measuring Text Complexity
  • 2018
  • Konferensbidrag (övrigt vetenskapligt/konstnärligt)abstract
    • We present results from assessing text complexity based on a factorisation of text property measures into components. The components are evaluated by investigating their ability to classify texts belonging to different genres. Our results show that the text complexity components correctly classify texts belonging to specific genres, given that the genres adhere to certain textual conventions. We also propose a radar chart visualisation to communicate component based text complexity.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 12

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy