SwePub
Sök i SwePub databas

  Utökad sökning

Träfflista för sökning "WAKA:kon ;lar1:(hb);pers:(Sönströd Cecilia)"

Sökning: WAKA:kon > Högskolan i Borås > Sönströd Cecilia

  • Resultat 1-10 av 16
Sortera/gruppera träfflistan
   
NumreringReferensOmslagsbildHitta
1.
  • Johansson, Ulf, et al. (författare)
  • Accurate and Interpretable Regression Trees using Oracle Coaching
  • 2014
  • Konferensbidrag (refereegranskat)abstract
    • In many real-world scenarios, predictive models need to be interpretable, thus ruling out many machine learning techniques known to produce very accurate models, e.g., neural networks, support vector machines and all ensemble schemes. Most often, tree models or rule sets are used instead, typically resulting in significantly lower predictive performance. The over- all purpose of oracle coaching is to reduce this accuracy vs. comprehensibility trade-off by producing interpretable models optimized for the specific production set at hand. The method requires production set inputs to be present when generating the predictive model, a demand fulfilled in most, but not all, predic- tive modeling scenarios. In oracle coaching, a highly accurate, but opaque, model is first induced from the training data. This model (“the oracle”) is then used to label both the training instances and the production instances. Finally, interpretable models are trained using different combinations of the resulting data sets. In this paper, the oracle coaching produces regression trees, using neural networks and random forests as oracles. The experiments, using 32 publicly available data sets, show that the oracle coaching leads to significantly improved predictive performance, compared to standard induction. In addition, it is also shown that a highly accurate opaque model can be successfully used as a pre- processing step to reduce the noise typically present in data, even in situations where production inputs are not available. In fact, just augmenting or replacing training data with another copy of the training set, but with the predictions from the opaque model as targets, produced significantly more accurate and/or more compact regression trees.
  •  
2.
  • Johansson, Ulf, et al. (författare)
  • Chipper : A Novel Algorithm for Concept Description
  • 2008
  • Ingår i: Frontiers in Artificial Intelligence and Applications. - : IOS Press. - 9781586038670 ; , s. 133-140
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, several demands placed on concept description algorithms are identified and discussed. The most important criterion is the ability to produce compact rule sets that, in a natural and accurate way, describe the most important relationships in the underlying domain. An algorithm based on the identified criteria is presented and evaluated. The algorithm, named Chipper, produces decision lists, where each rule covers a maximum number of remaining instances while meeting requested accuracy requirements. In the experiments, Chipper is evaluated on nine UCI data sets. The main result is that Chipper produces compact and understandable rule sets, clearly fulfilling the overall goal of concept description. In the experiments, Chipper's accuracy is similar to standard decision tree and rule induction algorithms, while rule sets have superior comprehensibility.
  •  
3.
  • Johansson, Ulf, et al. (författare)
  • Conformal Prediction for Accuracy Guarantees in Classification with Reject Option
  • 2023
  • Ingår i: Modeling Decisions for Artificial Intelligence. - : Springer. - 9783031334979 ; , s. 133-145, s. 133-145
  • Konferensbidrag (refereegranskat)abstract
    • A standard classifier is forced to predict the label of every test instance, even when confidence in the predictions is very low. In many scenarios, it would, however, be better to avoid making these predictions, maybe leaving them to a human expert. A classifier with that alternative is referred to as a classifier with reject option. In this paper, we propose an algorithm that, for a particular data set, automatically suggests a number of accuracy levels, which it will be able to meet perfectly, using a classifier with reject option. Since the basis of the suggested algorithm is conformal prediction, it comes with strong validity guarantees. The experimentation, using 25 publicly available two-class data sets, confirms that the algorithm obtains empirical accuracies very close to the requested levels. In addition, in an outright comparison with probabilistic predictors, including models calibrated with Platt scaling, the suggested algorithm clearly outperforms the alternatives.
  •  
4.
  • Johansson, Ulf, et al. (författare)
  • Fish or Shark : Data Mining Online Poker
  • 2009
  • Konferensbidrag (refereegranskat)abstract
    • In this paper, data mining techniques are used to analyze data gathered from online poker. The study focuses on short-handed Texas Hold’em, and the data sets used contain thousands of human players, each having played more than 1000 hands. The study has two, complementary, goals. First, building predictive models capable of categorizing players into good and bad players, i.e., winners and losers. Second, producing clear and accurate descriptions of what constitutes the difference between winning and losing in poker. In the experimentation, neural network ensembles are shown to be very accurate when categorizing player profiles into winners and losers. Furthermore, decision trees and decision lists used to acquire concept descriptions are shown to be quite comprehensible, and still fairly accurate. Finally, an analysis of obtained concept descriptions discovered several rather unexpected rules, indicating that the suggested approach is potentially valuable for the poker domain.
  •  
5.
  • Johansson, Ulf, et al. (författare)
  • Locally Induced Predictive Models
  • 2011
  • Konferensbidrag (refereegranskat)abstract
    • Most predictive modeling techniques utilize all available data to build global models. This is despite the wellknown fact that for many problems, the targeted relationship varies greatly over the input space, thus suggesting that localized models may improve predictive performance. In this paper, we suggest and evaluate a technique inducing one predictive model for each test instance, using only neighboring instances. In the experimentation, several different variations of the suggested algorithm producing localized decision trees and neural network models are evaluated on 30 UCI data sets. The main result is that the suggested approach generally yields better predictive performance than global models built using all available training data. As a matter of fact, all techniques producing J48 trees obtained significantly higher accuracy and AUC, compared to the global J48 model. For RBF network models, with their inherent ability to use localized information, the suggested approach was only successful with regard to accuracy, while global RBF models had a better ranking ability, as seen by their generally higher AUCs.
  •  
6.
  • Johansson, Ulf, et al. (författare)
  • One Tree to Explain Them All
  • 2011
  • Konferensbidrag (refereegranskat)abstract
    • Random forest is an often used ensemble technique, renowned for its high predictive performance. Random forests models are, however, due to their sheer complexity inherently opaque, making human interpretation and analysis impossible. This paper presents a method of approximating the random forest with just one decision tree. The approach uses oracle coaching, a recently suggested technique where a weaker but transparent model is generated using combinations of regular training data and test data initially labeled by a strong classifier, called the oracle. In this study, the random forest plays the part of the oracle, while the transparent models are decision trees generated by either the standard tree inducer J48, or by evolving genetic programs. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves both accuracy and area under ROC curve, compared to using training data only. As a matter of fact, resulting single tree models are as accurate as the random forest, on the specific test instances. Most importantly, this is not achieved by inducing or evolving huge trees having perfect fidelity; a large majority of all trees are instead rather compact and clearly comprehensible. The experiments also show that the evolution outperformed J48, with regard to accuracy, but that this came at the expense of slightly larger trees.
  •  
7.
  • Johansson, Ulf, et al. (författare)
  • Oracle Coached Decision Trees and Lists
  • 2010
  • Konferensbidrag (refereegranskat)abstract
    • This paper introduces a novel method for obtaining increased predictive performance from transparent models in situations where production input vectors are available when building the model. First, labeled training data is used to build a powerful opaque model, called an oracle. Second, the oracle is applied to production instances, generating predicted target values, which are used as labels. Finally, these newly labeled instances are utilized, in different combinations with normal training data, when inducing a transparent model. Experimental results, on 26 UCI data sets, show that the use of oracle coaches significantly improves predictive performance, compared to standard model induction. Most importantly, both accuracy and AUC results are robust over all combinations of opaque and transparent models evaluated. This study thus implies that the straightforward procedure of using a coaching oracle, which can be used with arbitrary classifiers, yields significantly better predictive performance at a low computational cost.
  •  
8.
  • Johansson, Ulf, et al. (författare)
  • Regression Trees for Streaming Data with Local Performance Guarantees
  • 2014
  • Konferensbidrag (refereegranskat)abstract
    • Online predictive modeling of streaming data is a key task for big data analytics. In this paper, a novel approach for efficient online learning of regression trees is proposed, which continuously updates, rather than retrains, the tree as more labeled data become available. A conformal predictor outputs prediction sets instead of point predictions; which for regression translates into prediction intervals. The key property of a conformal predictor is that it is always valid, i.e., the error rate, on novel data, is bounded by a preset significance level. Here, we suggest applying Mondrian conformal prediction on top of the resulting models, in order to obtain regression trees where not only the tree, but also each and every rule, corresponding to a path from the root node to a leaf, is valid. Using Mondrian conformal prediction, it becomes possible to analyze and explore the different rules separately, knowing that their accuracy, in the long run, will not be below the preset significance level. An empirical investigation, using 17 publicly available data sets, confirms that the resulting rules are independently valid, but also shows that the prediction intervals are smaller, on average, than when only the global model is required to be valid. All-in-all, the suggested method provides a data miner or a decision maker with highly informative predictive models of streaming data.
  •  
9.
  • Johansson, Ulf, et al. (författare)
  • Using Feature Selection with Bagging and Rule Extraction in Drug Discovery
  • 2010
  • Konferensbidrag (refereegranskat)abstract
    • This paper investigates different ways of combining feature selection with bagging and rule extraction in predictive modeling. Experiments on a large number of data sets from the medicinal chemistry domain, using standard algorithms implemented in theWeka data mining workbench, show that feature selection can lead to significantly improved predictive performance.When combining feature selection with bagging, employing the feature selection on each bootstrap obtains the best result.When using decision trees for rule extraction, the effect of feature selection can actually be detrimental, unless the transductive approach oracle coaching is also used. However, employing oracle coaching will lead to significantly improved performance, and the best results are obtainedwhen performing feature selection before training the opaque model. The overall conclusion is that it can make a substantial difference for the predictive performance exactly how feature selection is used in conjunction with other techniques.
  •  
10.
  • Johansson, Ulf, et al. (författare)
  • Using Genetic Programming to Obtain Implicit Diversity
  • 2009
  • Konferensbidrag (refereegranskat)abstract
    • When performing predictive data mining, the use of ensembles is known to increase prediction accuracy, compared to single models. To obtain this higher accuracy, ensembles should be built from base classifiers that are both accurate and diverse. The question of how to balance these two properties in order to maximize ensemble accuracy is, however, far from solved and many different techniques for obtaining ensemble diversity exist. One such technique is bagging, where implicit diversity is introduced by training base classifiers on different subsets of available data instances, thus resulting in less accurate, but diverse base classifiers. In this paper, genetic programming is used as an alternative method to obtain implicit diversity in ensembles by evolving accurate, but different base classifiers in the form of decision trees, thus exploiting the inherent inconsistency of genetic programming. The experiments show that the GP approach outperforms standard bagging of decision trees, obtaining significantly higher ensemble accuracy over 25 UCI datasets. This superior performance stems from base classifiers having both higher average accuracy and more diversity. Implicitly introducing diversity using GP thus works very well, since evolved base classifiers tend to be highly accurate and diverse.
  •  
Skapa referenser, mejla, bekava och länka
  • Resultat 1-10 av 16

Kungliga biblioteket hanterar dina personuppgifter i enlighet med EU:s dataskyddsförordning (2018), GDPR. Läs mer om hur det funkar här.
Så här hanterar KB dina uppgifter vid användning av denna tjänst.

 
pil uppåt Stäng

Kopiera och spara länken för att återkomma till aktuell vy